view doc/v2_planning/arch_src/plugin_JB_comments_RP.txt @ 1216:5a8930e089ed

replied in plugin_JB_comments_RP
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 22 Sep 2010 12:12:30 -0400
parents 681b5e7e3b81
children 5d1b5906151c
line wrap: on
line source

I agree with Ian, maybe using caps is not the best idea. It reminds be of BASIC which I used to do long time ago :). It also makes the code look a bit scary.

JB replies: personally i think it makes the code look more AWESOME but I could
go either way.  See reply to Ian in plugin_JB_comments_IG.txt

I like the approach and I think it goes close to my earliest proposition and to what I am proposing for the layer committeee ( though we did not have a meeting yet). 
I would though write it in a more Theano like ( Ian has a example of how that would look). I would also drop the CALL and FILT constructs, and actually have a 
decorator ( or something ) that wraps around a function to transform it into a call or filt. I hope that this is only syntactic sugar ( does this change anything
in the actual implementation ?? ) that makes things more natural. What I want to reach is something that looks very much as Theano, just that now you are creating
the graph of execution steps. Refractoring what you wrote this will look like

x = buffer_repeat( 1000, dataset.next())
train_pca = pca.analyze(x)

train_pca.run()

If you allow a FILT to also get multiple inputs ( so not just the one) which comes natural in this way of writing you can get to describe a DAG that not only 
describes the order of execution but also deals with what takes data from what. I'm sorry for not being there yesturday, from what I remember I have the 
feeling that for you that is done under the hood and not taken care by this flow control structures. 

To be a bit more explicit, in the way of writing the code above you can see that :
  a) dataset_next() has to run before pca_analyze
  b) pca_analyze needs the result (data) object of buffer_repeat( dataset.next()) 

I've actually elaborated on this idea here and there, and figured out what the result from such a control flow thing is, and how to make everything explicit 
in the graph. Parts of this is in my plugin_RP.py ( Step 1) though it is a bit of a moving target. I also have a sligtly different way of writing REPEAT 
and BUFFER_REPEAT .. though I think is mostly the same. I actually did not know how to deal with distributed things until I saw how you deal with that in your code.
Copy-pasted a version of a SDAA with my way of writing : 

    ## Layer 1:

    data_x,data_y = GPU_transform(load_mnist())
    noisy_data_x  = gaussian_noise(data_x, amount = 0.1)
    hidden1       = tanh(dotW_b(data_x, n_units = 200))
    reconstruct1  = reconstruct(hidden1.replace(data_x, noisy_data_x),
                            noisy_data_x)
    err1          = cross_entropy(reconstruct1, data_x)
    learner1      = SGD(err1)

    # Layer 2 :
    noisy_hidden1 = gaussian_noise(hidden1, amount = 0.1)
    hidden2       = tanh(dotW_b(hidden1, n_units = 200))
    reconstruct2  = reconstruct(hidden2.replace(hidden1,noisy_hidden1),
                            noisy_hidden1)
    err2          = cross_entropy(reconstruct2, hidden)
    learner2      = SGD(err2)

    # Top layer:

    output  = sigmoid(dotW_b(hidden2, n_units = 10))
    err     = cross_entropy(output, data_y)
    learner = SGD(err)


GPU_transform,gaussian_noise and so on are functions that have been decorated ( or classes if you want) 
that you would write using FILT.  Reconstruct for me is a different CONTROL FLOW element. 
In this case I don't use REPEAT or BUFFER_REPEAT or the other very cool control flow elements, but you
can easily imagine writing something like

pretrained_in_parallel = weave( learner1, learner2)
results = spawn(repeat(5000,learner1),repeat(500,learner2))


JB replies:

  This reply makes it clearer to me that I was not sensitive enough to the
  difference between *expressions* and *control-flow statements*.  What you have
  above is a graph of declarative expressions (if I understand correctly) with
  certain properties:

    - they have no side effects
    - they can be re-ordered within dependency constraints

  Contrast this with the CALL statements in my proposal:

    - they work primarily by side effect
    - they cannot be re-ordered at all

  So the fact that CALL currently works by side effect means that there is
  almost no graph-manipulation that can be guaranteed not to change the program.
  This is a reason to make CALL statements *encapsulate* programs constructed
  using declarative constructs (i.e. Theano functions)

  In other words, in this short term, this feels to me like the reason to *not*
  mix Theano graph building with this control-flow business.  

  Consequently, I think I will remove the BUFFER_REPEAT construct since that is
  really an expression masquerading as a control flow statement, and I will
  remove FILT too.