annotate doc/seriestables.txt @ 1152:0904dd74894d

Added to coding style information a wiki link about my Vim setup recommendations.
author David Warde-Farley <wardefar@iro.umontreal.ca>
date Thu, 16 Sep 2010 17:11:10 -0400
parents 34d1cd516f76
children
rev   line source
911
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
1 .. SeriesTables documentation master file, created by
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
2 sphinx-quickstart on Wed Mar 10 17:56:41 2010.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
3 You can adapt this file completely to your liking, but it should at least
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
4 contain the root `toctree` directive.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
5
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
6 Introduction to ``SeriesTables``
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
7 --------------------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
8
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
9 SeriesTables was created to make it easier to **record scalar data series**, such as, notably, the **evolution of errors (training, valid, test) during training**. There are other common usecases I foresee, such as **recording basic statistics (mean, min/max, variance) of parameters** during training, to diagnose problems.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
10
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
11 I also think that if such recording is easily accessible, it might lead us to record other statistics, such as stats concerning activations in the network (i.e. to diagnose unit saturation problems).
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
12
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
13 Each **element of a series is indexed and timestamped**. By default, for example, the index is named "epoch", which means that with each row an epoch number is stored (but this can be easily customized). By default, the timestamp at row creation time will also be stored, along with the CPU clock() time. This is to allow graphs plotting error series against epoch or training time.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
14
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
15 Series are saved in HDF5 files, which I'll introduce briefly.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
16
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
17 Introduction to PyTables and HDF5
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
18 ---------------------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
19
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
20 HDF5_ is a file format intended for storage of big numerical datasets. In practice, for our concern, you'll create a single ``.h5`` file, in which many tables, corresponding to different series, will be stored. Datasets in a single file are organized hierarchically, in the equivalent of "folders" called "groups". The "files" in the analogy would be our tables.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
21
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
22 .. _HDF5: http://www.hdfgroup.org/HDF5/
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
23
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
24 A useful property of HDF5 is that metadata is stored along with the data itself. Notably, we have the table names and column names inside the file. We can also attach more complex data, such as title, or even complex objects (which will be pickled), as attributes.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
25
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
26 PyTables_ is a Python library to use the HDF5 format.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
27
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
28 .. _PyTables: http://www.pytables.org/moin/HowToUse
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
29
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
30 Here's a basic Python session in which I create a new file and store a few rows in a single table:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
31
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
32 >>> import tables
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
33 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
34 >>> hdf5_file = tables.openFile("mytables.h5", "w")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
35 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
36 >>> # Create a new subgroup under the root group "/"
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
37 ... mygroup = hdf5_file.createGroup("/", "mygroup")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
38 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
39 >>> # Define the type of data we want to store
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
40 ... class MyDescription(tables.IsDescription):
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
41 ... int_column_1 = tables.Int32Col(pos=0)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
42 ... float_column_1 = tables.Float32Col(pos=1)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
43 ...
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
44 >>> # Create a table under mygroup
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
45 ... mytable = hdf5_file.createTable("/mygroup", "mytable", MyDescription)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
46 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
47 >>> newrow = mytable.row
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
48 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
49 >>> # a first row
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
50 ... newrow["int_column_1"] = 15
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
51 >>> newrow["float_column_1"] = 30.0
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
52 >>> newrow.append()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
53 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
54 >>> # and a second row
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
55 ... newrow["int_column_1"] = 16
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
56 >>> newrow["float_column_1"] = 32.0
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
57 >>> newrow.append()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
58 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
59 >>> # make sure we write to disk
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
60 ... hdf5_file.flush()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
61 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
62 >>> hdf5_file.close()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
63
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
64
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
65 And here's a session in which I reload the data and explore it:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
66
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
67 >>> import tables
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
68 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
69 >>> hdf5_file = tables.openFile("mytables.h5", "r")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
70 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
71 >>> mytable = hdf5_file.getNode("/mygroup", "mytable")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
72 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
73 >>> # tables can be "sliced" this way
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
74 ... mytable[0:2]
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
75 array([(15, 30.0), (16, 32.0)],
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
76 dtype=[('int_column_1', '<i4'), ('float_column_1', '<f4')])
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
77 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
78 >>> # or we can access columns individually
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
79 ... mytable.cols.int_column_1[0:2]
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
80 array([15, 16], dtype=int32)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
81
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
82
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
83 Using ``SeriesTables``: a basic example
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
84 ---------------------------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
85
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
86 Here's a very basic example usage:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
87
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
88 >>> import tables
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
89 >>> from pylearn.io.seriestables import *
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
90 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
91 >>> tables_file = tables.openFile("series.h5", "w")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
92 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
93 >>> error_series = ErrorSeries(error_name="validation_error", \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
94 ... table_name="validation_error", \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
95 ... hdf5_file=tables_file)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
96 >>>
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
97 >>> error_series.append((1,), 32.0)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
98 >>> error_series.append((2,), 28.0)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
99 >>> error_series.append((3,), 26.0)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
100
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
101 I can then open the file ``series.h5``, which will contain a table named ``validation_error`` with a column name ``epoch`` and another named ``validation_error``. There will also be ``timestamp`` and ``cpuclock`` columns, as this is the default behavior. The table rows will correspond to the data added with ``append()`` above.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
102
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
103 Indices
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
104 .......
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
105
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
106 You may notice that the first parameter in ``append()`` is a tuple. This is because the *index* may have multiple levels. The index is a way for rows to have an order.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
107
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
108 In the default case for ErrorSeries, the index only has an "epoch", so the tuple only has one element. But in the ErrorSeries(...) constructor, you could have specified the ``index_names`` parameter, e.g. ``('epoch','minibatch')``, which would allow you to specify both the epoch and the minibatch as index.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
109
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
110
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
111 Summary of the most useful classes
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
112 ----------------------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
113
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
114 By default, for each of these series, there are also columns for timestamp and CPU clock() value when append() is called. This can be changed with the store_timestamp and store_cpuclock parameters of their constructors.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
115
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
116 ErrorSeries
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
117 This records one floating point (32 bit) value along with an index in a new table.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
118
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
119 AccumulatorSeriesWrapper
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
120 This wraps another Series and calls its ``append()`` method when its own ``append()`` as been called N times, N being a parameter when constructing the ``AccumulatorSeriesWrapper``. A simple use case: say you want to store the mean of the training error every 100 minibatches. You create an ErrorSeries, wrap it with an Accumulator and then call its ``append()`` for every minibatch. It will collect the errors, wait until it has 100, then take the mean (with ``numpy.mean``) and store it in the ErrorSeries, and start over again.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
121 Other "reducing" functions can be used instead of "mean".
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
122
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
123 BasicStatisticsSeries
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
124 This stores the mean, the min, the max and the standard deviation of arrays you pass to its ``append()`` method. This is useful, notably, to see how the weights (and other parameters) evolve during training without actually storing the parameters themselves.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
125
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
126 SharedParamsStatisticsWrapper
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
127 This wraps a few BasicStatisticsSeries. It is specifically designed so you can pass it a list of shared (as in theano.shared) parameter arrays. Each array will get its own table, under a new HDF5 group. You can name each table, e.g. "layer1_b", "layer1_W", etc.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
128
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
129 Example of real usage
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
130 ---------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
131
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
132 The following is a function where I create the series used to record errors and statistics about parameters in a stacked denoising autoencoder script:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
133
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
134 .. code-block:: python
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
135
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
136 def create_series(num_hidden_layers):
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
137
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
138 # Replace series we don't want to save with DummySeries, e.g.
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
139 # series['training_error'] = DummySeries()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
140
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
141 series = {}
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
142
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
143 basedir = os.getcwd()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
144
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
145 h5f = tables.openFile(os.path.join(basedir, "series.h5"), "w")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
146
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
147 # training error is accumulated over 100 minibatches,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
148 # then the mean is computed and saved in the training_base series
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
149 training_base = \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
150 ErrorSeries(error_name="training_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
151 table_name="training_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
152 hdf5_file=h5f,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
153 index_names=('epoch','minibatch'),
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
154 title="Training error (mean over 100 minibatches)")
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
155
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
156 # this series wraps training_base, performs accumulation
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
157 series['training_error'] = \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
158 AccumulatorSeriesWrapper(base_series=training_base,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
159 reduce_every=100)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
160
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
161 # valid and test are not accumulated/mean, saved directly
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
162 series['validation_error'] = \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
163 ErrorSeries(error_name="validation_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
164 table_name="validation_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
165 hdf5_file=h5f,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
166 index_names=('epoch',))
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
167
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
168 series['test_error'] = \
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
169 ErrorSeries(error_name="test_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
170 table_name="test_error",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
171 hdf5_file=h5f,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
172 index_names=('epoch',))
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
173
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
174 # next we want to store the parameters statistics
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
175 # so first we create the names for each table, based on
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
176 # position of each param in the array
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
177 param_names = []
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
178 for i in range(num_hidden_layers):
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
179 param_names += ['layer%d_W'%i, 'layer%d_b'%i, 'layer%d_bprime'%i]
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
180 param_names += ['logreg_layer_W', 'logreg_layer_b']
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
181
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
182
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
183 series['params'] = SharedParamsStatisticsWrapper(
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
184 new_group_name="params",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
185 base_group="/",
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
186 arrays_names=param_names,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
187 hdf5_file=h5f,
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
188 index_names=('epoch',))
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
189
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
190 return series
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
191
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
192 Then, here's an example of append() usage for each of these series, wrapped in pseudocode:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
193
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
194 .. code-block:: python
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
195
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
196 series = create_series(num_hidden_layers=3)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
197
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
198 ...
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
199
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
200 for epoch in range(num_epochs):
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
201 for mb_index in range(num_minibatches):
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
202 train_error = finetune(mb_index)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
203 series['training_error'].append((epoch, mb_index), train_error)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
204
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
205 valid_error = compute_validation_error()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
206 series['validation_error'].append((epoch,), valid_error)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
207
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
208 test_error = compute_test_error()
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
209 series['test_error'].append((epoch,), test_error)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
210
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
211 # suppose all_params is a list [layer1_W, layer1_b, ...]
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
212 # where each element is a shared (as in theano.shared) array
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
213 series['params'].append((epoch,), all_params)
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
214
929
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
215 Other targets for appending (e.g. printing to stdout)
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
216 -----------------------------------------------------
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
217
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
218 SeriesTables was created with an HDF5 file in mind, but often, for debugging,
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
219 it's useful to be able to redirect the series elsewhere, notably the standard
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
220 output. A mechanism was added to do just that.
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
221
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
222 What you do is you create a ``AppendTarget`` instance (or more than one) and
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
223 pass it as an argument to the Series constructor. For example, to print every
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
224 row appended to the standard output, you use StdoutAppendTarget.
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
225
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
226 If you want to skip appending to the HDF5 file entirely, this is also
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
227 possible. You simply specify ``skip_hdf5_append=True`` in the constructor. You
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
228 still need to pass in a valid HDF5 file, though, even though nothing will be
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
229 written to it (for, err, legacy reasons).
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
230
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
231 Here's an example:
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
232
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
233 .. code-block:: python
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
234
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
235 def create_series(num_hidden_layers):
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
236
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
237 # Replace series we don't want to save with DummySeries, e.g.
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
238 # series['training_error'] = DummySeries()
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
239
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
240 series = {}
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
241
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
242 basedir = os.getcwd()
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
243
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
244 h5f = tables.openFile(os.path.join(basedir, "series.h5"), "w")
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
245
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
246 # Here we create the new target, with a message prepended
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
247 # before every row is printed to stdout
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
248 stdout_target = \
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
249 StdoutAppendTarget( \
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
250 prepend='\n-----------------\nValidation error',
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
251 indent_str='\t')
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
252
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
253 # Notice here we won't even write to the HDF5 file
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
254 series['validation_error'] = \
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
255 ErrorSeries(error_name="validation_error",
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
256 table_name="validation_error",
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
257 hdf5_file=h5f,
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
258 index_names=('epoch',),
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
259 other_targets=[stdout_target],
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
260 skip_hdf5_append=True)
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
261
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
262 return series
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
263
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
264
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
265 Now calls to series['validation_error'].append() will print to stdout outputs
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
266 like::
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
267
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
268 ----------------
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
269 Validation error
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
270 timestamp : 1271202144
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
271 cpuclock : 0.12
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
272 epoch : 1
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
273 validation_error : 30.0
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
274
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
275 ----------------
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
276 Validation error
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
277 timestamp : 1271202144
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
278 cpuclock : 0.12
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
279 epoch : 2
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
280 validation_error : 26.0
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
281
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
282
911
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
283 Visualizing in vitables
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
284 -----------------------
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
285
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
286 vitables_ is a program with which you can easily explore an HDF5 ``.h5`` file. Here's a screenshot in which I visualize series produced for the preceding example:
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
287
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
288 .. _vitables: http://vitables.berlios.de/
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
289
fdb63e4e042d Had forgotten to hg add SeriesTable .txt doc
fsavard
parents:
diff changeset
290 .. image:: images/vitables_example_series.png
929
34d1cd516f76 Added other targets (printing to stdout, notably) to seriestables, and corresponding doc
fsavard
parents: 911
diff changeset
291