view doc/v2_planning/arch_src/plugin_JB_comments_RP.txt @ 1419:cff305ad9f60

TensorFnDataset - added x_ attribute that caches the dataset function return value, but does not get pickled.
author James Bergstra <bergstrj@iro.umontreal.ca>
date Fri, 04 Feb 2011 16:05:22 -0500
parents 699ed5f5f188
children
line wrap: on
line source

I agree with Ian, maybe using caps is not the best idea. It reminds be of BASIC which I used to do long time ago :). It also makes the code look a bit scary.

JB replies: personally i think it makes the code look more AWESOME but I could
go either way.  See reply to Ian in plugin_JB_comments_IG.txt

I like the approach and I think it goes close to my earliest proposition and to what I am proposing for the layer committeee ( though we did not have a meeting yet). 
I would though write it in a more Theano like ( Ian has a example of how that would look). I would also drop the CALL and FILT constructs, and actually have a 
decorator ( or something ) that wraps around a function to transform it into a call or filt. I hope that this is only syntactic sugar ( does this change anything
in the actual implementation ?? ) that makes things more natural. What I want to reach is something that looks very much as Theano, just that now you are creating
the graph of execution steps. Refractoring what you wrote this will look like

x = buffer_repeat( 1000, dataset.next())
train_pca = pca.analyze(x)

train_pca.run()

If you allow a FILT to also get multiple inputs ( so not just the one) which comes natural in this way of writing you can get to describe a DAG that not only 
describes the order of execution but also deals with what takes data from what. I'm sorry for not being there yesturday, from what I remember I have the 
feeling that for you that is done under the hood and not taken care by this flow control structures. 

To be a bit more explicit, in the way of writing the code above you can see that :
  a) dataset_next() has to run before pca_analyze
  b) pca_analyze needs the result (data) object of buffer_repeat( dataset.next()) 

I've actually elaborated on this idea here and there, and figured out what the result from such a control flow thing is, and how to make everything explicit 
in the graph. Parts of this is in my plugin_RP.py ( Step 1) though it is a bit of a moving target. I also have a sligtly different way of writing REPEAT 
and BUFFER_REPEAT .. though I think is mostly the same. I actually did not know how to deal with distributed things until I saw how you deal with that in your code.
Copy-pasted a version of a SDAA with my way of writing : 

    ## Layer 1:

    data_x,data_y = GPU_transform(load_mnist())
    noisy_data_x  = gaussian_noise(data_x, amount = 0.1)
    hidden1       = tanh(dotW_b(data_x, n_units = 200))
    reconstruct1  = reconstruct(hidden1.replace(data_x, noisy_data_x),
                            noisy_data_x)
    err1          = cross_entropy(reconstruct1, data_x)
    learner1      = SGD(err1)

    # Layer 2 :
    noisy_hidden1 = gaussian_noise(hidden1, amount = 0.1)
    hidden2       = tanh(dotW_b(hidden1, n_units = 200))
    reconstruct2  = reconstruct(hidden2.replace(hidden1,noisy_hidden1),
                            noisy_hidden1)
    err2          = cross_entropy(reconstruct2, hidden)
    learner2      = SGD(err2)

    # Top layer:

    output  = sigmoid(dotW_b(hidden2, n_units = 10))
    err     = cross_entropy(output, data_y)
    learner = SGD(err)


GPU_transform,gaussian_noise and so on are functions that have been decorated ( or classes if you want) 
that you would write using FILT.  Reconstruct for me is a different CONTROL FLOW element. 
In this case I don't use REPEAT or BUFFER_REPEAT or the other very cool control flow elements, but you
can easily imagine writing something like

pretrained_in_parallel = weave( learner1, learner2)
results = spawn(repeat(5000,learner1),repeat(500,learner2))


JB replies:

  This reply makes it clearer to me that I was not sensitive enough to the
  difference between *expressions* and *control-flow statements*.  What you have
  above is a graph of declarative expressions (if I understand correctly) with
  certain properties:

    - they have no side effects
    - they can be re-ordered within dependency constraints

  Contrast this with the CALL statements in my proposal:

    - they work primarily by side effect
    - they cannot be re-ordered at all

  So the fact that CALL currently works by side effect means that there is
  almost no graph-manipulation that can be guaranteed not to change the program.
  This is a reason to make CALL statements *encapsulate* programs constructed
  using declarative constructs (i.e. Theano functions)

  In other words, in this short term, this feels to me like the reason to *not*
  mix Theano graph building with this control-flow business.  

  Consequently, I think I will remove the BUFFER_REPEAT construct since that is
  really an expression masquerading as a control flow statement, and I will
  remove FILT too.

RP asks:

  I understand now the difference between what you wrote and what I had in
  mind. Though I don't undestand the argument against it. Do you mean to say
  that writing it the way I proposed implies a much more complicated backbone
  framework which will take us to long to develop? Or is there something else
  that you meant ?


JB replies:
  
  I don't think it's necessary to combine theano with this control-flow
  proposal, and I don't know how to do it.  Yes, it seems like it would be hard
  and/or awkward, and I don't even really see the advantage of even trying to do
  it.

RP:

  I think you misunderstood me. I did not propose to mix Theano with the
  library. I agree that would be awkward. What I had in mind ( which might be
  just something different from what you are doing) is to use some concepts
  from how Theano deals with things. 
  
  For example right now you added registers. Writing something like:

  CALL( fn, arg1, arg2, arg3, _set= reg('x') )

  means actually  reg('x') = fn (arg1,arg2,arg3)

  You get most of what you want because this control flow elements don't
  actually get executed until you run the program. That means that you 
  have a fixed simple graph ( you can't play around with it) that tells
  how to execute your commands. You can save that graph, and the point in 
  the graph where you stop so that you can resume latter. You can also 
  save all registers at that point. 


  Why not have that fn instead of being a python function, be some class
  that implements a method run which does what your call would do. 
  The init/ or __call__ of that class would do what CALL does in your case.
  Do you think that would be impossible to implement? Any such function 
  could either return a set of registers or not. 

  Your other control flow things will be just special such functions. 
  The only thing that would might look a bit strange would be the sequence
  in case you need to return things. Maybe I could use the same trick,
  namely a _set arguemnt to __call__.

  I'm not against your approach, I just think it can be written a bit 
  differently, which in my opinion is easier to read, understand and so 
  on. I will have nothing against if we decide to write it exactly how you
  propose and I'm sure that I will get the hang of it pretty fast.


  Bottom line (in my view):
    - I don't say we should mix Theano with anything
    - I think writing things such that it looks like applying functions to 
    object is a more natural way, easy to understand for noobs
    - Writing a new functions by inheriting a class and implementing a method
    is also natural
    - I do not propose to do optimizations or play with the graph ! I do
      though think that you should be able to : 
        * replace parts of a subgraph with a different
        * automatically collect hyper-parameters or parameters if you ever want
        to
        * change the value of these somehow