Mercurial > pylearn
comparison doc/v2_planning/learner.txt @ 1044:3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Wed, 08 Sep 2010 12:55:30 -0400 |
parents | 3f528656855b |
children | d57bdd9a9980 |
comparison
equal
deleted
inserted
replaced
1043:3f528656855b | 1044:3b1fd599bafd |
---|---|
265 itself. In other words, there is no API through which to attach information to nodes. It is | 265 itself. In other words, there is no API through which to attach information to nodes. It is |
266 not good to say that the Learner instance *is* the node because (a) learner instances change | 266 not good to say that the Learner instance *is* the node because (a) learner instances change |
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a | 267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a |
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills | 268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills |
269 over into other committees, so we should get their feedback about how to resolve it. | 269 over into other committees, so we should get their feedback about how to resolve it. |
270 | |
271 Just another view/spin on the same idea (Razvan) | |
272 ================================================ | |
273 | |
274 | |
275 My idea is probably just a spin off from what James wrote. It is an extension | |
276 of what I send on the mailing list some time ago. | |
277 | |
278 Big Picture | |
279 ----------- | |
280 | |
281 What do we care about ? | |
282 ~~~~~~~~~~~~~~~~~~~~~~~ | |
283 | |
284 This is the list of the main points that I have in mind : | |
285 | |
286 * Re-usability | |
287 * Extensibility | |
288 * Simplicity or easily readable code ( connected to re-usability ) | |
289 * Modularity ( connected to extensibility ) | |
290 * Fast to write code ( - sort of comes out of simplicity) | |
291 * Efficient code | |
292 | |
293 | |
294 Composition | |
295 ~~~~~~~~~~~ | |
296 | |
297 To me this reads as code generated by composing pieces. Imagine this : | |
298 you start of with something primitive that I will call a "variable", which | |
299 probably is a very unsuitable name. And then you compose those intial | |
300 "variables" or transform them through several "functions". Each such | |
301 "function" hides some logic, that you as the user don't care about. | |
302 You can have low-level or micro "functions" and high-level or macro | |
303 "functions", where a high-level function is just a certain compositional | |
304 pattern of low-level "functions". There are several classes of "functions" | |
305 and "variables" that can be inter-changable. This is how modularity is | |
306 obtained, by chainging between functions from a certain class. | |
307 | |
308 Now when you want to research something, what you do is first select | |
309 the step you want to look into. If you are lucky you can re-write this | |
310 step as certain decomposition of low-level transformations ( there can be | |
311 multiple such decompositions). If not you have to implement such a | |
312 decompositions acording to your needs. Pick the low-level transformations you want | |
313 to change and write new versions that implement your logic. | |
314 | |
315 I think the code will be easy to read, because it is just applying a fixed | |
316 set of transformations, one after the other. The one who writes the code can | |
317 decide how explicit he wants to write things by switching between high-level | |
318 and low-level functions. | |
319 | |
320 I think the code this way is re-usable, because you can just take this chain | |
321 of transformation and replace the one you care about, without looking into | |
322 the rest. | |
323 | |
324 You get this fractal property of the code. Zooming in, you always get just | |
325 a set of functions applied to a set of variables. In the begining those might | |
326 not be there, and you would have to create new "low level" decompositions, | |
327 maybe even new "variables" that get data between those decompositions. | |
328 | |
329 The thing with variables here, is that I don't want this "functions" to have | |
330 a state. All the information is passed along through these variables. This | |
331 way understanding the graph is easy, debugging it is also easier ( then having | |
332 all these hidden states ..) | |
333 | |
334 Note that while doing so we might ( and I strongly think we should) create | |
335 a (symbolic) DAG of operations. ( this is where it becomes what James was saying). | |
336 In such a DAG the "variables" will the nodes and the functions will be edges. | |
337 I think having a DAG is useful in many ways (all this are things that one | |
338 might think about implementing in a far future, I'm not proposing to implement | |
339 them unless we want to use them - like the reconstruction ): | |
340 * there exist the posibility of writing optimizations ( theano style ) | |
341 * there exist the posibility to add global view utility functions ( like | |
342 a reconstruction function for SdA - extremely low level here), or global | |
343 view diagnostic tools | |
344 * the posibility of creating a GUI ( where you just create the Graph by | |
345 picking transforms and variables from a list ) or working interactively | |
346 and then generating code that will reproduce the graph | |
347 * you can view the graph and different granularity levels to understand | |
348 things ( global diagnostics) | |
349 | |
350 We should have a taxonomy of possible classes of functions and possible | |
351 classes of variables, but those should not be exclusive. We can work at a high | |
352 level for now, and decompose those high level functions to lower level when | |
353 we need to. We can introduce new classes of functions or intermediate | |
354 variables between those low level functions. | |
355 | |
356 | |
357 Similarities with James' idea | |
358 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
359 | |
360 As I said before, this is I think just another view on what James proposed. | |
361 The learner in his case is the module that traverses the graph of this | |
362 operations, which makes sense here as well. | |
363 | |
364 The 'execute' command in his api is just applying a function to some variables in | |
365 my case. | |
366 | |
367 The learner keeps track of the graph that is formed I think in both cases. | |
368 | |
369 His view is a bit more general. I see the graph as fully created by the user, | |
370 and the learner just has to go from the start to the end. In his case the | |
371 traversal is conditioned on some policies. I think these ideas can be mixed / | |
372 united. What I would see in my case to have this functionality is something | |
373 similar to the lazy linker for Theano. | |
374 | |
375 | |
376 | |
377 |