ift6266: code_tutoriel/mlp.py annotate

annotate code_tutoriel/mlp.py @ 618:14ba0120baff

review response changes

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sun, 09 Jan 2011 14:13:23 -0500
parents	4bc5eeec6394
children

rev	line source
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	1 """
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	2 This tutorial introduces the multilayer perceptron using Theano.
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	3
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	4 A multilayer perceptron is a logistic regressor where
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	5 instead of feeding the input to the logistic regression you insert a
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	6 intermidiate layer, called the hidden layer, that has a nonlinear
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	7 activation function (usually tanh or sigmoid) . One can use many such
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	8 hidden layers making the architecture deep. The tutorial will also tackle
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	9 the problem of MNIST digit classification.
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	10
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	11 .. math::
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	12
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	13 f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	14
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	15 References:
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	16
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	17 - textbooks: "Pattern Recognition and Machine Learning" -
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	18 Christopher M. Bishop, section 5
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	19
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	20 """
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	21 __docformat__ = 'restructedtext en'
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	22
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	23
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	24 import numpy, time, cPickle, gzip
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	25
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	26 import theano
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	27 import theano.tensor as T
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	28
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	29
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	30 from logistic_sgd import LogisticRegression, load_data
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	31
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	32
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	33 class HiddenLayer(object):
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	34 def __init__(self, rng, input, n_in, n_out, activation = T.tanh):
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	35 """
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	36 Typical hidden layer of a MLP: units are fully-connected and have
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	37 sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	38 and the bias vector b is of shape (n_out,).
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	39
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	40 NOTE : The nonlinearity used here is tanh
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	41
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	42 Hidden unit activation is given by: tanh(dot(input,W) + b)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	43
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	44 :type rng: numpy.random.RandomState
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	45 :param rng: a random number generator used to initialize weights
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	46
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	47 :type input: theano.tensor.dmatrix
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	48 :param input: a symbolic tensor of shape (n_examples, n_in)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	49
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	50 :type n_in: int
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	51 :param n_in: dimensionality of input
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	52
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	53 :type n_out: int
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	54 :param n_out: number of hidden units
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	55
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	56 :type activation: theano.Op or function
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	57 :param activation: Non linearity to be applied in the hidden
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	58 layer
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	59 """
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	60 self.input = input
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	61
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	62 # `W` is initialized with `W_values` which is uniformely sampled
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	63 # from -6./sqrt(n_in+n_hidden) and 6./sqrt(n_in+n_hidden)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	64 # the output of uniform if converted using asarray to dtype
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	65 # theano.config.floatX so that the code is runable on GPU
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	66 W_values = numpy.asarray( rng.uniform( \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	67 low = -numpy.sqrt(6./(n_in+n_out)), \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	68 high = numpy.sqrt(6./(n_in+n_out)), \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	69 size = (n_in, n_out)), dtype = theano.config.floatX)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	70 self.W = theano.shared(value = W_values)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	71
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	72 b_values = numpy.zeros((n_out,), dtype= theano.config.floatX)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	73 self.b = theano.shared(value= b_values)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	74
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	75 self.output = activation(T.dot(input, self.W) + self.b)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	76 # parameters of the model
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	77 self.params = [self.W, self.b]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	78
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	79
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	80 class MLP(object):
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	81 """Multi-Layer Perceptron Class
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	82
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	83 A multilayer perceptron is a feedforward artificial neural network model
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	84 that has one layer or more of hidden units and nonlinear activations.
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	85 Intermidiate layers usually have as activation function thanh or the
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	86 sigmoid function (defined here by a ``SigmoidalLayer`` class) while the
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	87 top layer is a softamx layer (defined here by a ``LogisticRegression``
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	88 class).
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	89 """
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	90
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	91
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	92
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	93 def __init__(self, rng, input, n_in, n_hidden, n_out):
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	94 """Initialize the parameters for the multilayer perceptron
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	95
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	96 :type rng: numpy.random.RandomState
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	97 :param rng: a random number generator used to initialize weights
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	98
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	99 :type input: theano.tensor.TensorType
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	100 :param input: symbolic variable that describes the input of the
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	101 architecture (one minibatch)
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	102
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	103 :type n_in: int
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	104 :param n_in: number of input units, the dimension of the space in
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	105 which the datapoints lie
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	106
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	107 :type n_hidden: int
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	108 :param n_hidden: number of hidden units
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	109
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	110 :type n_out: int
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	111 :param n_out: number of output units, the dimension of the space in
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	112 which the labels lie
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	113
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	114 """
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	115
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	116 # Since we are dealing with a one hidden layer MLP, this will
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	117 # translate into a TanhLayer connected to the LogisticRegression
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	118 # layer; this can be replaced by a SigmoidalLayer, or a layer
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	119 # implementing any other nonlinearity
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	120 self.hiddenLayer = HiddenLayer(rng = rng, input = input,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	121 n_in = n_in, n_out = n_hidden,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	122 activation = T.tanh)
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	123
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	124 # The logistic regression layer gets as input the hidden units
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	125 # of the hidden layer
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	126 self.logRegressionLayer = LogisticRegression(
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	127 input = self.hiddenLayer.output,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	128 n_in = n_hidden,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	129 n_out = n_out)
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	130
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	131 # L1 norm ; one regularization option is to enforce L1 norm to
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	132 # be small
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	133 self.L1 = abs(self.hiddenLayer.W).sum() \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	134 + abs(self.logRegressionLayer.W).sum()
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	135
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	136 # square of L2 norm ; one regularization option is to enforce
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	137 # square of L2 norm to be small
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	138 self.L2_sqr = (self.hiddenLayer.W**2).sum() \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	139 + (self.logRegressionLayer.W**2).sum()
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	140
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	141 # negative log likelihood of the MLP is given by the negative
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	142 # log likelihood of the output of the model, computed in the
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	143 # logistic regression layer
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	144 self.negative_log_likelihood = self.logRegressionLayer.negative_log_likelihood
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	145 # same holds for the function computing the number of errors
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	146 self.errors = self.logRegressionLayer.errors
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	147
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	148 # the parameters of the model are the parameters of the two layer it is
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	149 # made out of
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	150 self.params = self.hiddenLayer.params + self.logRegressionLayer.params
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	151
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	152
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	153 def test_mlp( learning_rate=0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs=1000,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	154 dataset = 'mnist.pkl.gz'):
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	155 """
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	156 Demonstrate stochastic gradient descent optimization for a multilayer
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	157 perceptron
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	158
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	159 This is demonstrated on MNIST.
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	160
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	161 :type learning_rate: float
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	162 :param learning_rate: learning rate used (factor for the stochastic
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	163 gradient
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	164
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	165 :type L1_reg: float
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	166 :param L1_reg: L1-norm's weight when added to the cost (see
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	167 regularization)
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	168
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	169 :type L2_reg: float
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	170 :param L2_reg: L2-norm's weight when added to the cost (see
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	171 regularization)
2 bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	172
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	173 :type n_epochs: int
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	174 :param n_epochs: maximal number of epochs to run the optimizer
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	175
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	176 :type dataset: string
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	177 :param dataset: the path of the MNIST dataset file from
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	178 http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	179
2 bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	180
bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	181 """
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	182 datasets = load_data(dataset)
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	183
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	184 train_set_x, train_set_y = datasets[0]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	185 valid_set_x, valid_set_y = datasets[1]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	186 test_set_x , test_set_y = datasets[2]
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	187
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	188
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	189
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	190 batch_size = 20 # size of the minibatch
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	191
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	192 # compute number of minibatches for training, validation and testing
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	193 n_train_batches = train_set_x.value.shape[0] / batch_size
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	194 n_valid_batches = valid_set_x.value.shape[0] / batch_size
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	195 n_test_batches = test_set_x.value.shape[0] / batch_size
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	196
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	197 ######################
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	198 # BUILD ACTUAL MODEL #
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	199 ######################
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	200 print '... building the model'
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	201
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	202 # allocate symbolic variables for the data
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	203 index = T.lscalar() # index to a [mini]batch
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	204 x = T.matrix('x') # the data is presented as rasterized images
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	205 y = T.ivector('y') # the labels are presented as 1D vector of
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	206 # [int] labels
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	207
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	208 rng = numpy.random.RandomState(1234)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	209
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	210 # construct the MLP class
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	211 classifier = MLP( rng = rng, input=x, n_in=28*28, n_hidden = 500, n_out=10)
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	212
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	213 # the cost we minimize during training is the negative log likelihood of
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	214 # the model plus the regularization terms (L1 and L2); cost is expressed
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	215 # here symbolically
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	216 cost = classifier.negative_log_likelihood(y) \
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	217 + L1_reg * classifier.L1 \
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	218 + L2_reg * classifier.L2_sqr
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	219
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	220 # compiling a Theano function that computes the mistakes that are made
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	221 # by the model on a minibatch
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	222 test_model = theano.function(inputs = [index],
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	223 outputs = classifier.errors(y),
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	224 givens={
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	225 x:test_set_x[indexbatch_size:(index+1)batch_size],
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	226 y:test_set_y[indexbatch_size:(index+1)batch_size]})
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	227
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	228 validate_model = theano.function(inputs = [index],
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	229 outputs = classifier.errors(y),
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	230 givens={
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	231 x:valid_set_x[indexbatch_size:(index+1)batch_size],
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	232 y:valid_set_y[indexbatch_size:(index+1)batch_size]})
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	233
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	234 # compute the gradient of cost with respect to theta (sotred in params)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	235 # the resulting gradients will be stored in a list gparams
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	236 gparams = []
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	237 for param in classifier.params:
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	238 gparam = T.grad(cost, param)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	239 gparams.append(gparam)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	240
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	241
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	242 # specify how to update the parameters of the model as a dictionary
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	243 updates = {}
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	244 # given two list the zip A = [ a1,a2,a3,a4] and B = [b1,b2,b3,b4] of
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	245 # same length, zip generates a list C of same size, where each element
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	246 # is a pair formed from the two lists :
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	247 # C = [ (a1,b1), (a2,b2), (a3,b3) , (a4,b4) ]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	248 for param, gparam in zip(classifier.params, gparams):
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	249 updates[param] = param - learning_rate*gparam
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	250
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	251 # compiling a Theano function `train_model` that returns the cost, but
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	252 # in the same time updates the parameter of the model based on the rules
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	253 # defined in `updates`
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	254 train_model =theano.function( inputs = [index], outputs = cost,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	255 updates = updates,
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	256 givens={
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	257 x:train_set_x[indexbatch_size:(index+1)batch_size],
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	258 y:train_set_y[indexbatch_size:(index+1)batch_size]})
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	259
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	260 ###############
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	261 # TRAIN MODEL #
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	262 ###############
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	263 print '... training'
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	264
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	265 # early-stopping parameters
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	266 patience = 10000 # look as this many examples regardless
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	267 patience_increase = 2 # wait this much longer when a new best is
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	268 # found
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	269 improvement_threshold = 0.995 # a relative improvement of this much is
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	270 # considered significant
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	271 validation_frequency = min(n_train_batches,patience/2)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	272 # go through this many
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	273 # minibatche before checking the network
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	274 # on the validation set; in this case we
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	275 # check every epoch
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	276
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	277
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	278 best_params = None
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	279 best_validation_loss = float('inf')
2 bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	280 best_iter = 0
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	281 test_score = 0.
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	282 start_time = time.clock()
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	283
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	284 epoch = 0
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	285 done_looping = False
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	286
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	287 while (epoch < n_epochs) and (not done_looping):
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	288 epoch = epoch + 1
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	289 for minibatch_index in xrange(n_train_batches):
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	290
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	291 minibatch_avg_cost = train_model(minibatch_index)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	292 # iteration number
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	293 iter = epoch * n_train_batches + minibatch_index
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	294
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	295 if (iter+1) % validation_frequency == 0:
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	296 # compute zero-one loss on validation set
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	297 validation_losses = [validate_model(i) for i in xrange(n_valid_batches)]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	298 this_validation_loss = numpy.mean(validation_losses)
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	299
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	300 print('epoch %i, minibatch %i/%i, validation error %f %%' % \
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	301 (epoch, minibatch_index+1,n_train_batches, \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	302 this_validation_loss*100.))
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	303
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	304
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	305 # if we got the best validation score until now
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	306 if this_validation_loss < best_validation_loss:
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	307 #improve patience if loss improvement is good enough
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	308 if this_validation_loss < best_validation_loss * \
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	309 improvement_threshold :
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	310 patience = max(patience, iter * patience_increase)
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	311
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	312 best_validation_loss = this_validation_loss
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	313 # test it on the test set
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	314
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	315 test_losses = [test_model(i) for i in xrange(n_test_batches)]
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	316 test_score = numpy.mean(test_losses)
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	317
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	318 print((' epoch %i, minibatch %i/%i, test error of best '
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	319 'model %f %%') % \
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	320 (epoch, minibatch_index+1, n_train_batches,test_score*100.))
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	321
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	322 if patience <= iter :
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	323 done_looping = True
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	324 break
4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	325
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	326
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	327 end_time = time.clock()
2 bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	328 print(('Optimization complete. Best validation score of %f %% '
bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	329 'obtained at iteration %i, with test performance %f %%') %
bcc87d3e33a3 adding latest tutorial code Dumitru Erhan <dumitru.erhan@gmail.com> parents: 0 diff changeset	330 (best_validation_loss * 100., best_iter, test_score*100.))
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	331 print ('The code ran for %f minutes' % ((end_time-start_time)/60.))
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	332
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	333
fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	334 if __name__ == '__main__':
165 4bc5eeec6394 Updating the tutorial code to the latest revisions. Dumitru Erhan <dumitru.erhan@gmail.com> parents: 18 diff changeset	335 test_mlp()
0 fda5f787baa6 commit initial Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	336

Mercurial > ift6266

annotate code_tutoriel/mlp.py @ 618:14ba0120baff