annotate pylearn/algorithms/sigmoid_output_SdA.py @ 1476:8c10bda4bb5f

Configured default train/valid/test split for icml07.MNIST_rotated_background dataset. Defaults are the ones used by Hugo in the ICML07 paper and in all contracting auto-encoder papers.
author gdesjardins
date Fri, 20 May 2011 16:53:00 -0400
parents daa355332b66
children
rev   line source
939
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
1 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
2 This tutorial introduces stacked denoising auto-encoders (SdA) using Theano.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
3
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
4 Denoising autoencoders are the building blocks for SdA.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
5 They are based on auto-encoders as the ones used in Bengio et al. 2007.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
6 An autoencoder takes an input x and first maps it to a hidden representation
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
7 y = f_{\theta}(x) = s(Wx+b), parameterized by \theta={W,b}. The resulting
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
8 latent representation y is then mapped back to a "reconstructed" vector
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
9 z \in [0,1]^d in input space z = g_{\theta'}(y) = s(W'y + b'). The weight
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
10 matrix W' can optionally be constrained such that W' = W^T, in which case
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
11 the autoencoder is said to have tied weights. The network is trained such
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
12 that to minimize the reconstruction error (the error between x and z).
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
13
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
14 For the denosing autoencoder, during training, first x is corrupted into
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
15 \tilde{x}, where \tilde{x} is a partially destroyed version of x by means
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
16 of a stochastic mapping. Afterwards y is computed as before (using
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
17 \tilde{x}), y = s(W\tilde{x} + b) and z as s(W'y + b'). The reconstruction
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
18 error is now measured between z and the uncorrupted input x, which is
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
19 computed as the cross-entropy :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
20 - \sum_{k=1}^d[ x_k \log z_k + (1-x_k) \log( 1-z_k)]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
21
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
22
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
23 References :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
24 - P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol: Extracting and
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
25 Composing Robust Features with Denoising Autoencoders, ICML'08, 1096-1103,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
26 2008
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
27 - Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle: Greedy Layer-Wise
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
28 Training of Deep Networks, Advances in Neural Information Processing
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
29 Systems 19, 2007
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
30
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
31 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
32
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
33 import numpy, time, cPickle, gzip, sys, os
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
34
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
35 import theano
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
36 import theano.tensor as T
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
37 from theano.tensor.shared_randomstreams import RandomStreams
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
38
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
39 from logistic_sgd import load_data
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
40 from mlp import HiddenLayer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
41 from dA import dA
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
42
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
43
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
44
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
45 class BinaryLogisticRegressions(object):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
46 """Multiple 2-class Logistic Regressions Class
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
47
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
48 The logistic regressions are fully described by a weight matrix :math:`W`
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
49 and bias vector :math:`b`. Classification is done by projecting data
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
50 points onto a set of hyperplanes, the distance to which is used to
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
51 determine a class membership probability.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
52 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
53
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
54
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
55
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
56
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
57 def __init__(self, input, n_in, n_out):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
58 """ Initialize the parameters of the logistic regression
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
59
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
60 :type input: theano.tensor.TensorType
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
61 :param input: symbolic variable that describes the input of the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
62 architecture (one minibatch)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
63
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
64 :type n_in: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
65 :param n_in: number of input units, the dimension of the space in
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
66 which the datapoints lie
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
67
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
68 :type n_out: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
69 :param n_out: number of output units, the dimension of the space in
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
70 which the labels lie
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
71
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
72 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
73
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
74 # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
75 self.W = theano.shared(value=numpy.zeros((n_in,n_out), dtype = theano.config.floatX),
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
76 name='W')
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
77 # initialize the baises b as a vector of n_out 0s
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
78 self.b = theano.shared(value=numpy.zeros((n_out,), dtype = theano.config.floatX),
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
79 name='b')
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
80
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
81
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
82 # compute vector of class-membership probabilities in symbolic form
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
83 self.p_y_given_x = T.nnet.sigmoid(T.dot(input, self.W)+self.b)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
84
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
85 # compute prediction as class whose probability is maximal in
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
86 # symbolic form
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
87 self.y_pred=T.argmax(self.p_y_given_x, axis=1)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
88
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
89 # parameters of the model
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
90 self.params = [self.W, self.b]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
91
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
92
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
93
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
94
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
95
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
96 def negative_log_likelihood(self, y):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
97 """Return the mean of the negative log-likelihood of the prediction
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
98 of this model under a given target distribution.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
99
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
100 .. math::
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
101
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
102 \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
103 \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|} \log(P(Y^{(i)}=y^{(i)}|x^{(i)}, W,b)) \\
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
104 \ell (\theta=\{W,b\}, \mathcal{D})
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
105
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
106 :type y: theano.tensor.TensorType
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
107 :param y: corresponds to a vector that gives for each example the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
108 correct label
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
109
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
110 Note: we use the mean instead of the sum so that
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
111 the learning rate is less dependent on the batch size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
112 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
113 return -T.mean(T.sum( y*T.log(self.p_y_given_x) + (1-y)*T.log(1-self.p_y_given_x), axis=1 ) )
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
114
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
115
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
116 def errors(self, y):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
117 """Return a float representing the number of errors in the minibatch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
118 over the total number of examples of the minibatch ; zero one
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
119 loss over the size of the minibatch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
120
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
121 :type y: theano.tensor.TensorType
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
122 :param y: corresponds to a vector that gives for each example the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
123 correct label
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
124 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
125
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
126 # check if y has same dimension of y_pred
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
127 if y.ndim != self.y_pred.ndim:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
128 raise TypeError('y should have the same shape as self.y_pred',
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
129 ('y', target.type, 'y_pred', self.y_pred.type))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
130 # check if y is of the correct datatype
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
131 if y.dtype.startswith('int'):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
132 # the T.neq operator returns a vector of 0s and 1s, where 1
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
133 # represents a mistake in prediction
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
134 return T.mean(T.neq(self.y_pred, y))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
135 else:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
136 raise NotImplementedError()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
137
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
138
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
139
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
140 class SdA(object):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
141 """Stacked denoising auto-encoder class (SdA)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
142
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
143 A stacked denoising autoencoder model is obtained by stacking several
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
144 dAs. The hidden layer of the dA at layer `i` becomes the input of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
145 the dA at layer `i+1`. The first layer dA gets as input the input of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
146 the SdA, and the hidden layer of the last dA represents the output.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
147 Note that after pretraining, the SdA is dealt with as a normal MLP,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
148 the dAs are only used to initialize the weights.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
149 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
150
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
151 def __init__(self, numpy_rng, theano_rng = None, n_ins = 784,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
152 hidden_layers_sizes = [500,500], n_outs = 10,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
153 corruption_levels = [0.1, 0.1]):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
154 """ This class is made to support a variable number of layers.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
155
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
156 :type numpy_rng: numpy.random.RandomState
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
157 :param numpy_rng: numpy random number generator used to draw initial
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
158 weights
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
159
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
160 :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
161 :param theano_rng: Theano random generator; if None is given one is
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
162 generated based on a seed drawn from `rng`
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
163
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
164 :type n_ins: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
165 :param n_ins: dimension of the input to the sdA
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
166
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
167 :type n_layers_sizes: list of ints
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
168 :param n_layers_sizes: intermidiate layers size, must contain
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
169 at least one value
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
170
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
171 :type n_outs: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
172 :param n_outs: dimension of the output of the network
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
173
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
174 :type corruption_levels: list of float
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
175 :param corruption_levels: amount of corruption to use for each
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
176 layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
177 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
178
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
179 self.sigmoid_layers = []
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
180 self.dA_layers = []
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
181 self.params = []
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
182 self.n_layers = len(hidden_layers_sizes)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
183
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
184 assert self.n_layers > 0
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
185
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
186 if not theano_rng:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
187 theano_rng = RandomStreams(numpy_rng.randint(2**30))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
188 # allocate symbolic variables for the data
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
189 self.x = T.matrix('x') # the data is presented as rasterized images
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
190 self.y = T.ivector('y') # the labels are presented as 1D vector of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
191 # [int] labels
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
192
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
193 # The SdA is an MLP, for which all weights of intermidiate layers
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
194 # are shared with a different denoising autoencoders
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
195 # We will first construct the SdA as a deep multilayer perceptron,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
196 # and when constructing each sigmoidal layer we also construct a
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
197 # denoising autoencoder that shares weights with that layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
198 # During pretraining we will train these autoencoders (which will
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
199 # lead to chainging the weights of the MLP as well)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
200 # During finetunining we will finish training the SdA by doing
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
201 # stochastich gradient descent on the MLP
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
202
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
203 for i in xrange( self.n_layers ):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
204 # construct the sigmoidal layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
205
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
206 # the size of the input is either the number of hidden units of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
207 # the layer below or the input size if we are on the first layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
208 if i == 0 :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
209 input_size = n_ins
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
210 else:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
211 input_size = hidden_layers_sizes[i-1]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
212
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
213 # the input to this layer is either the activation of the hidden
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
214 # layer below or the input of the SdA if you are on the first
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
215 # layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
216 if i == 0 :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
217 layer_input = self.x
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
218 else:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
219 layer_input = self.sigmoid_layers[-1].output
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
220
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
221 sigmoid_layer = HiddenLayer(rng = numpy_rng,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
222 input = layer_input,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
223 n_in = input_size,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
224 n_out = hidden_layers_sizes[i],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
225 activation = T.nnet.sigmoid)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
226 # add the layer to our list of layers
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
227 self.sigmoid_layers.append(sigmoid_layer)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
228 # its arguably a philosophical question...
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
229 # but we are going to only declare that the parameters of the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
230 # sigmoid_layers are parameters of the StackedDAA
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
231 # the visible biases in the dA are parameters of those
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
232 # dA, but not the SdA
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
233 self.params.extend(sigmoid_layer.params)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
234
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
235 # Construct a denoising autoencoder that shared weights with this
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
236 # layer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
237 dA_layer = dA(numpy_rng = numpy_rng, theano_rng = theano_rng, input = layer_input,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
238 n_visible = input_size,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
239 n_hidden = hidden_layers_sizes[i],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
240 W = sigmoid_layer.W, bhid = sigmoid_layer.b)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
241 self.dA_layers.append(dA_layer)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
242
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
243
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
244 # We now need to add a logistic layer on top of the MLP
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
245 #self.logLayer = LogisticRegression(\
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
246 # input = self.sigmoid_layers[-1].output,\
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
247 # n_in = hidden_layers_sizes[-1], n_out = n_outs)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
248
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
249 self.logLayer = BinaryLogisticRegressions(\
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
250 input = self.sigmoid_layers[-1].output,\
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
251 n_in = hidden_layers_sizes[-1], n_out = n_outs)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
252
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
253 self.params.extend(self.logLayer.params)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
254 # construct a function that implements one step of finetunining
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
255
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
256 # compute the cost for second phase of training,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
257 # defined as the negative log likelihood
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
258 #self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
259 self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
260
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
261 # compute the gradients with respect to the model parameters
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
262 # symbolic variable that points to the number of errors made on the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
263 # minibatch given by self.x and self.y
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
264 self.errors = self.logLayer.errors(self.y)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
265
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
266 def pretraining_functions(self, train_set_x, batch_size):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
267 ''' Generates a list of functions, each of them implementing one
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
268 step in trainnig the dA corresponding to the layer with same index.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
269 The function will require as input the minibatch index, and to train
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
270 a dA you just need to iterate, calling the corresponding function on
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
271 all minibatch indexes.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
272
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
273 :type train_set_x: theano.tensor.TensorType
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
274 :param train_set_x: Shared variable that contains all datapoints used
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
275 for training the dA
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
276
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
277 :type batch_size: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
278 :param batch_size: size of a [mini]batch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
279
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
280 :type learning_rate: float
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
281 :param learning_rate: learning rate used during training for any of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
282 the dA layers
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
283 '''
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
284
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
285 # index to a [mini]batch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
286 index = T.lscalar('index') # index to a minibatch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
287 corruption_level = T.scalar('corruption') # amount of corruption to use
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
288 learning_rate = T.scalar('lr') # learning rate to use
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
289 # number of batches
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
290 n_batches = train_set_x.value.shape[0] / batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
291 # begining of a batch, given `index`
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
292 batch_begin = index * batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
293 # ending of a batch given `index`
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
294 batch_end = batch_begin+batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
295
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
296 pretrain_fns = []
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
297 for dA in self.dA_layers:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
298 # get the cost and the updates list
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
299 cost,updates = dA.get_cost_updates( corruption_level, learning_rate)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
300 # compile the theano function
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
301 fn = theano.function( inputs = [index,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
302 theano.Param(corruption_level, default = 0.2),
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
303 theano.Param(learning_rate, default = 0.1)],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
304 outputs = cost,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
305 updates = updates,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
306 givens = {self.x :train_set_x[batch_begin:batch_end]})
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
307 # append `fn` to the list of functions
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
308 pretrain_fns.append(fn)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
309
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
310 return pretrain_fns
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
311
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
312
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
313 def build_finetune_functions(self, datasets, batch_size, learning_rate):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
314 '''Generates a function `train` that implements one step of
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
315 finetuning, a function `validate` that computes the error on
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
316 a batch from the validation set, and a function `test` that
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
317 computes the error on a batch from the testing set
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
318
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
319 :type datasets: list of pairs of theano.tensor.TensorType
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
320 :param datasets: It is a list that contain all the datasets;
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
321 the has to contain three pairs, `train`,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
322 `valid`, `test` in this order, where each pair
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
323 is formed of two Theano variables, one for the
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
324 datapoints, the other for the labels
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
325
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
326 :type batch_size: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
327 :param batch_size: size of a minibatch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
328
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
329 :type learning_rate: float
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
330 :param learning_rate: learning rate used during finetune stage
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
331 '''
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
332
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
333 (train_set_x, train_set_y) = datasets[0]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
334 (valid_set_x, valid_set_y) = datasets[1]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
335 (test_set_x , test_set_y ) = datasets[2]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
336
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
337 # compute number of minibatches for training, validation and testing
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
338 n_valid_batches = valid_set_x.value.shape[0] / batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
339 n_test_batches = test_set_x.value.shape[0] / batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
340
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
341 index = T.lscalar('index') # index to a [mini]batch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
342
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
343 # compute the gradients with respect to the model parameters
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
344 gparams = T.grad(self.finetune_cost, self.params)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
345
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
346 # compute list of fine-tuning updates
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
347 updates = {}
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
348 for param, gparam in zip(self.params, gparams):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
349 updates[param] = param - gparam*learning_rate
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
350
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
351 train_fn = theano.function(inputs = [index],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
352 outputs = self.finetune_cost,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
353 updates = updates,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
354 givens = {
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
355 self.x : train_set_x[index*batch_size:(index+1)*batch_size],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
356 self.y : train_set_y[index*batch_size:(index+1)*batch_size]})
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
357
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
358 test_score_i = theano.function([index], self.errors,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
359 givens = {
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
360 self.x: test_set_x[index*batch_size:(index+1)*batch_size],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
361 self.y: test_set_y[index*batch_size:(index+1)*batch_size]})
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
362
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
363 valid_score_i = theano.function([index], self.errors,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
364 givens = {
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
365 self.x: valid_set_x[index*batch_size:(index+1)*batch_size],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
366 self.y: valid_set_y[index*batch_size:(index+1)*batch_size]})
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
367
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
368 # Create a function that scans the entire validation set
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
369 def valid_score():
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
370 return [valid_score_i(i) for i in xrange(n_valid_batches)]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
371
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
372 # Create a function that scans the entire test set
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
373 def test_score():
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
374 return [test_score_i(i) for i in xrange(n_test_batches)]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
375
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
376 return train_fn, valid_score, test_score
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
377
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
378
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
379
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
380
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
381
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
382
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
383 def test_SdA( finetune_lr = 0.1, pretraining_epochs = 15, \
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
384 pretrain_lr = 0.05, training_epochs = 1000, \
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
385 dataset='../data/mnist.pkl.gz', batch_size = 1):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
386 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
387 Demonstrates how to train and test a stochastic denoising autoencoder.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
388
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
389 This is demonstrated on MNIST.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
390
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
391 :type learning_rate: float
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
392 :param learning_rate: learning rate used in the finetune stage
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
393 (factor for the stochastic gradient)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
394
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
395 :type pretraining_epochs: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
396 :param pretraining_epochs: number of epoch to do pretraining
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
397
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
398 :type pretrain_lr: float
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
399 :param pretrain_lr: learning rate to be used during pre-training
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
400
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
401 :type n_iter: int
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
402 :param n_iter: maximal number of iterations ot run the optimizer
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
403
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
404 :type dataset: string
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
405 :param dataset: path the the pickled dataset
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
406
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
407 """
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
408
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
409 datasets = load_data(dataset)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
410
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
411 train_set_x, train_set_y = datasets[0]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
412 valid_set_x, valid_set_y = datasets[1]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
413 test_set_x , test_set_y = datasets[2]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
414
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
415
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
416 # compute number of minibatches for training, validation and testing
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
417 n_train_batches = train_set_x.value.shape[0] / batch_size
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
418
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
419 # numpy random generator
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
420 numpy_rng = numpy.random.RandomState(123)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
421 print '... building the model'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
422 # construct the stacked denoising autoencoder class
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
423 sda = SdA( numpy_rng = numpy_rng, n_ins = 28*28,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
424 hidden_layers_sizes = [1000,1000,1000],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
425 n_outs = 10)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
426
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
427
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
428 #########################
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
429 # PRETRAINING THE MODEL #
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
430 #########################
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
431 print '... getting the pretraining functions'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
432 pretraining_fns = sda.pretraining_functions(
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
433 train_set_x = train_set_x,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
434 batch_size = batch_size )
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
435
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
436 print '... pre-training the model'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
437 start_time = time.clock()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
438 ## Pre-train layer-wise
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
439 corruption_levels = [.1,.1,.0]
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
440 for i in xrange(sda.n_layers):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
441 # go through pretraining epochs
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
442 for epoch in xrange(pretraining_epochs):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
443 # go through the training set
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
444 c = []
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
445 for batch_index in xrange(n_train_batches):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
446 c.append( pretraining_fns[i](index = batch_index,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
447 corruption = corruption_levels[i],
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
448 lr = pretrain_lr ) )
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
449 print 'Pre-training layer %i, epoch %d, cost '%(i,epoch),numpy.mean(c)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
450
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
451 end_time = time.clock()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
452
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
453 print >> sys.stderr, ('The pretraining code for file '+os.path.split(__file__)[1]+' ran for %.2fm expected 4.58m in our buildbot' % ((end_time-start_time)/60.))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
454
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
455 ########################
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
456 # FINETUNING THE MODEL #
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
457 ########################
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
458
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
459 # get the training, validation and testing function for the model
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
460 print '... getting the finetuning functions'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
461 train_fn, validate_model, test_model = sda.build_finetune_functions (
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
462 datasets = datasets, batch_size = batch_size,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
463 learning_rate = finetune_lr)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
464
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
465 print '... finetunning the model'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
466 # early-stopping parameters
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
467 patience = 10*n_train_batches # look as this many examples regardless
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
468 patience_increase = 2. # wait this much longer when a new best is
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
469 # found
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
470 improvement_threshold = 0.995 # a relative improvement of this much is
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
471 # considered significant
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
472 validation_frequency = min(n_train_batches, patience/2)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
473 # go through this many
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
474 # minibatche before checking the network
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
475 # on the validation set; in this case we
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
476 # check every epoch
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
477
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
478
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
479 best_params = None
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
480 best_validation_loss = float('inf')
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
481 test_score = 0.
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
482 start_time = time.clock()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
483
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
484 done_looping = False
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
485 epoch = 0
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
486
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
487 while (epoch < training_epochs) and (not done_looping):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
488 for minibatch_index in xrange(n_train_batches):
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
489 minibatch_avg_cost = train_fn(minibatch_index)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
490 iter = epoch * n_train_batches + minibatch_index
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
491
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
492 if (iter+1) % validation_frequency == 0:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
493 validation_losses = validate_model()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
494 this_validation_loss = numpy.mean(validation_losses)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
495 print('epoch %i, minibatch %i/%i, validation error %f %%' % \
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
496 (epoch, minibatch_index+1, n_train_batches, \
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
497 this_validation_loss*100.))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
498
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
499
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
500 # if we got the best validation score until now
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
501 if this_validation_loss < best_validation_loss:
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
502
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
503 #improve patience if loss improvement is good enough
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
504 if this_validation_loss < best_validation_loss * \
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
505 improvement_threshold :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
506 patience = max(patience, iter * patience_increase)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
507
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
508 # save best validation score and iteration number
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
509 best_validation_loss = this_validation_loss
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
510 best_iter = iter
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
511
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
512 # test it on the test set
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
513 test_losses = test_model()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
514 test_score = numpy.mean(test_losses)
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
515 print((' epoch %i, minibatch %i/%i, test error of best '
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
516 'model %f %%') %
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
517 (epoch, minibatch_index+1, n_train_batches,
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
518 test_score*100.))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
519
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
520
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
521 if patience <= iter :
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
522 done_looping = True
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
523 break
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
524 epoch = epoch + 1
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
525
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
526 end_time = time.clock()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
527 print(('Optimization complete with best validation score of %f %%,'
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
528 'with test performance %f %%') %
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
529 (best_validation_loss * 100., test_score*100.))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
530 print >> sys.stderr, ('The training code for file '+os.path.split(__file__)[1]+' ran for %.2fm expected 3.91m in our buildbot' % ((end_time-start_time)/60.))
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
531
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
532
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
533
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
534
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
535
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
536
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
537 if __name__ == '__main__':
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
538 test_SdA()
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
539
daa355332b66 added sigmoid_output_SdA.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
540