annotate sandbox/statscollector.py @ 484:3daabc7f94ff

Added Yoshua's explanation
author Joseph Turian <turian@gmail.com>
date Tue, 28 Oct 2008 01:33:27 -0400
parents d7611a3811f2
children
rev   line source
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
1
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
2 # Here is how I see stats collectors:
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
3
345
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
4 def my_stats(graph):
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
5 graph.mse=examplewise_mean(square_norm(graph.residue))
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
6 graph.training_loss=graph.regularizer+examplewise_sum(graph.nll)
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
7 return [graph.mse,graph.training_loss]
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
8
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
9
4efb503fd0da Added test for dataset/RenamedFieldsDataSet
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 326
diff changeset
10 # def my_stats(residue,nll,regularizer):
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
11 # mse=examplewise_mean(square_norm(residue))
326
fe57b96f33d4 made ExampleWiseMean Op
Olivier Breuleux <breuleuo@iro.umontreal.ca>
parents: 209
diff changeset
12 # training_loss=regularizer+examplewise_sum(nll)
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
13 # set_names(locals())
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
14 # return ((residue,nll),(regularizer),(),(mse,training_loss))
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
15 # my_stats_collector = make_stats_collector(my_stats)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
16 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
17 # where make_stats_collector calls my_stats(examplewise_fields, attributes) to
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
18 # construct its update function, and figure out what are the input fields (here "residue"
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
19 # and "nll") and input attributes (here "regularizer") it needs, and the output
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
20 # attributes that it computes (here "mse" and "training_loss"). Remember that
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
21 # fields are examplewise quantities, but attributes are not, in my jargon.
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
22 # In the above example, I am highlighting that some operations done in my_stats
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
23 # are examplewise and some are not. I am hoping that theano Ops can do these
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
24 # kinds of internal side-effect operations (and proper initialization of these hidden
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
25 # variables). I expect that a StatsCollector (returned by make_stats_collector)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
26 # knows the following methods:
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
27 # stats_collector.input_fieldnames
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
28 # stats_collector.input_attribute_names
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
29 # stats_collector.output_attribute_names
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
30 # stats_collector.update(mini_dataset)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
31 # stats_collector['mse']
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
32 # where mini_dataset has the input_fieldnames() as fields and the input_attribute_names()
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
33 # as attributes, and in the resulting dataset the output_attribute_names() are set to the
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
34 # proper numeric values.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
35
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
36
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
37
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
38 import theano
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
39 from theano import tensor as t
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
40 from Learner import Learner
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
41 from lookup_list import LookupList
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
42
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
43 class StatsCollectorModel(AttributesHolder):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
44 def __init__(self,stats_collector):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
45 self.stats_collector = stats_collector
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
46 self.outputs = LookupList(stats_collector.output_names,[None for name in stats_collector.output_names])
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
47 # the statistics get initialized here
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
48 self.update_function = theano.function(input_attributes+input_fields,output_attributes+output_fields,linker="c|py")
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
49 for name,value in self.outputs.items():
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
50 self.__setattribute__(name,value)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
51 def update(self,dataset):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
52 input_fields = dataset.fields()(self.stats_collector.input_field_names)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
53 input_attributes = dataset.getAttributes(self.stats_collector.input_attribute_names)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
54 self.outputs._values = self.update_function(input_attributes+input_fields)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
55 for name,value in self.outputs.items():
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
56 self.__setattribute__(name,value)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
57 def __call__(self):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
58 return self.outputs
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
59 def attributeNames(self):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
60 return self.outputs.keys()
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
61
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
62 class StatsCollector(AttributesHolder):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
63
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
64 def __init__(self,input_attributes, input_fields, outputs):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
65 self.input_attributes = input_attributes
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
66 self.input_fields = input_fields
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
67 self.outputs = outputs
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
68 self.input_attribute_names = [v.name for v in input_attributes]
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
69 self.input_field_names = [v.name for v in input_fields]
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
70 self.output_names = [v.name for v in output_attributes]
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
71
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
72 def __call__(self,dataset=None):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
73 model = StatsCollectorModel(self)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
74 if dataset:
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
75 self.update(dataset)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
76 return model
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
77
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
78 if __name__ == '__main__':
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
79 def my_statscollector():
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
80 regularizer = t.scalar()
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
81 nll = t.matrix()
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
82 class_error = t.matrix()
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
83 total_loss = regularizer+t.examplewise_sum(nll)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
84 avg_nll = t.examplewise_mean(nll)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
85 avg_class_error = t.examplewise_mean(class_error)
209
50a8302addaf template statscollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 192
diff changeset
86 for name,val in locals().items(): val.name = name
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
87 return StatsCollector([regularizer],[nll,class_error],[total_loss,avg_nll,avg_class_error])
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
88
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
89
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
90
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
91
192
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
92 # OLD DESIGN:
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
93 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
94 # class StatsCollector(object):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
95 # """A StatsCollector object is used to record performance statistics during training
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
96 # or testing of a learner. It can be configured to measure different things and
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
97 # accumulate the appropriate statistics. From these statistics it can be interrogated
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
98 # to obtain performance measures of interest (such as maxima, minima, mean, standard
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
99 # deviation, standard error, etc.). Optionally, the observations can be weighted
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
100 # (yielded weighted mean, weighted variance, etc., where applicable). The statistics
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
101 # that are desired can be specified among a list supported by the StatsCollector
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
102 # class or subclass. When some statistics are requested, others become automatically
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
103 # available (e.g., sum or mean)."""
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
104 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
105 # default_statistics = [mean,standard_deviation,min,max]
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
106 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
107 # __init__(self,n_quantities_observed, statistics=default_statistics):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
108 # self.n_quantities_observed=n_quantities_observed
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
109 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
110 # clear(self):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
111 # raise NotImplementedError
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
112 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
113 # update(self,observations):
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
114 # """The observations is a numpy vector of length n_quantities_observed. Some
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
115 # entries can be 'missing' (with a NaN entry) and will not be counted in the
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
116 # statistics."""
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
117 # raise NotImplementedError
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
118 #
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
119 # __getattr__(self, statistic)
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
120 # """Return a particular statistic, which may be inferred from the collected statistics.
f62a03c9d485 Redesign of StatsCollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1
diff changeset
121 # The argument is a string naming that statistic."""
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
122
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
123
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
124
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
125
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
126
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
diff changeset
127