comparison doc/v2_planning/learner.txt @ 1058:e342de3ae485

v2planning learner - added comments and TODO points
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 09 Sep 2010 11:49:57 -0400
parents bc3f7834db83
children f082a6c0b008
comparison
equal deleted inserted replaced
1057:baf1988db557 1058:e342de3ae485
254 etc.) Such a learner would replace synchronous instructions (return on completion) with 254 etc.) Such a learner would replace synchronous instructions (return on completion) with
255 asynchronous ones (return after scheduling) and the active instruction set would also change 255 asynchronous ones (return after scheduling) and the active instruction set would also change
256 asynchronously, but neither of these things is inconsistent with the Learner API. 256 asynchronously, but neither of these things is inconsistent with the Learner API.
257 257
258 258
259 TODO 259 TODO - Experiment API?
260 ~~~~ 260 ~~~~~~~~~~~~~~~~~~~~~~
261 261
262 I feel like something is missing from the API - and that is an interface to the graph structure 262 I feel like something is missing from the API - and that is an interface to the graph structure
263 discussed above. The nodes in this graph are natural places to store meta-information for 263 discussed above. The nodes in this graph are natural places to store meta-information for
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph 264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph
265 itself. In other words, there is no API through which to attach information to nodes. It is 265 itself. In other words, there is no API through which to attach information to nodes. It is
266 not good to say that the Learner instance *is* the node because (a) learner instances change 266 not good to say that the Learner instance *is* the node because (a) learner instances change
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a 267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills 268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills
269 over into other committees, so we should get their feedback about how to resolve it. 269 over into other committees, so we should get their feedback about how to resolve
270 270 it. Maybe we need an 'Experiment' API to stand for this graph?
271 Comments 271
272 ~~~~~~~~ 272
273 TODO: Validation & Monitoring Costs
274 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275
276 Even if we do have the Experiment API as a structure to hang validation and
277 monitoring results, what should be the mechanism for extracting those results.
278 The Learner API is not right because extracting a monitoring cost doesn't change
279 the model, doesn't change the legal instructions/edges etc. Maybe we should use
280 a similar mechanism to Instruction, called something like Measurement? Any node
281 / learner can report the list of instructions (for moving) and the list of
282 measurements (and the cost of computing them too)
283
284
285 TODO - Parameter Distributions
286 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
273 287
274 YB asks: it seems to me that what we really need from "Type" is not just 288 YB asks: it seems to me that what we really need from "Type" is not just
275 testing that a value is legal, but more practically a function that specifies the 289 testing that a value is legal, but more practically a function that specifies the
276 prior distribution for the hyper-parameter, i.e., how to sample from it, 290 prior distribution for the hyper-parameter, i.e., how to sample from it,
277 and possibly some representation of it that could be used to infer 291 and possibly some representation of it that could be used to infer
279 Having the min and max and default limits us to the uniform distribution, 293 Having the min and max and default limits us to the uniform distribution,
280 which may not always be appropriate. For example sometimes we'd like 294 which may not always be appropriate. For example sometimes we'd like
281 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers). 295 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers).
282 For that reason, I think that "Type" is not a very good name. 296 For that reason, I think that "Type" is not a very good name.
283 How about "Prior" or "Density" or something like that? 297 How about "Prior" or "Density" or something like that?
298
299 JB replies: I agree that being able to choose (and update) distributions over
300 these values is important. I don't think the Type structure is the right place
301 to handle it though. The challenge is to allow those distributions to change
302 for a variety of reasons - e.g. the sampling distribution on the capacity
303 variables is affected by the size of the dataset, it is also affected by
304 previous experience in general as well as experiments on that particular
305 dataset. I'm not sure that the 'Type' structure is right to deal with this.
306 Also, even with a strategy for handling these distributions, I believe a simple
307 mechanism for rejecting insane values might be useful.
308
309 So how should we handle it? Hmmm...
310
311
312 Comments
313 ~~~~~~~~
284 314
285 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm 315 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm
286 interested to see how the learner interface is shaping up so I'll be keeping 316 interested to see how the learner interface is shaping up so I'll be keeping
287 an eye on this file) 317 an eye on this file)
288 I'm wondering what's the benefit of such an API compared to simply defining a 318 I'm wondering what's the benefit of such an API compared to simply defining a