annotate onehotop.py @ 431:0f8c81b0776d

Adding file make_test_datasets to host simple data-generating processes to create artificial datasets meant to test various learning algorithms.
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 29 Jul 2008 10:19:25 -0400
parents 18702ceb2096
children
rev   line source
356
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
1 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
2 One hot Op
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
3 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
4
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
5 #from theano import tensor
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
6 from theano.tensor import as_tensor, Tensor
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
7 from theano.gof import op
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
8 from theano.gof.graph import Apply
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
9
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
10 import numpy
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
11
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
12 class OneHot(op.Op):
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
13 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
14 Construct a one-hot vector, x out of y.
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
15
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
16 @todo: Document inputs and outputs
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
17 @todo: Use 'bool' as output dtype? Or, at least 'int64' ? Not float64!
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
18 @todo: Use 'bool' as output dtype, not 'int64' ?
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
19 @todo: Allow this to operate on column vectors (Tensor)
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
20 @todo: Describe better.
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
21 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
22
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
23 def make_node(self, x, y):
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
24 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
25 @type x: Vector L{Tensor} of integers
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
26 @param x: The entries of the one-hot vector to be one.
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
27 @type y: Integer scalar L{Tensor}
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
28 @param y: The length (#columns) of the one-hot vectors.
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
29 @return: A L{Tensor} of one-hot vectors
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
30
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
31 @precondition: x < y for all entries of x
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
32 @todo: Check that x and y are int types
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
33 """
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
34 x = as_tensor(x)
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
35 y = as_tensor(y)
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
36 #assert x.dtype[0:3] == "int"
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
37 #assert y.dtype[0:3] == "int"
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
38 inputs = [x, y]
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
39 ##outputs = [tensor.Tensor("int64", broadcastable=[False, False])]
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
40 #outputs = [tensor.Tensor("float64", broadcastable=[False, False])]
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
41 #outputs = [Tensor("int64", broadcastable=[False, False])]
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
42 outputs = [Tensor("float64", broadcastable=[False, False]).make_result()]
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
43 node = Apply(op = self, inputs = inputs, outputs = outputs)
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
44 return node
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
45
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
46 def perform(self, node, (x, y), (out, )):
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
47 assert x.dtype == "int64" or x.dtype == "int32"
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
48 assert x.ndim == 1
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
49 assert y.dtype == "int64" or x.dtype == "int32"
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
50 assert y.ndim == 0
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
51 out[0] = numpy.zeros((x.shape[0], y), dtype="float64")
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
52 for c in range(x.shape[0]):
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
53 assert x[c] < y
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
54 out[0][c, x[c]] = 1
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
55
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
56 def grad(self, (x, y), (out_gradient, )):
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
57 return None, None
18702ceb2096 Added more functions
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff changeset
58 one_hot = OneHot()