# HG changeset patch # User Olivier Delalleau # Date 1289576359 18000 # Node ID 6b9673d72a4153fcffc1f5c4849d52f8c6914aab # Parent 7548dc1b163c239ac838cb5779e7969ed232add3 Datalearn replies / comments diff -r 7548dc1b163c -r 6b9673d72a41 doc/v2_planning/datalearn.txt --- a/doc/v2_planning/datalearn.txt Thu Nov 11 22:40:01 2010 -0500 +++ b/doc/v2_planning/datalearn.txt Fri Nov 12 10:39:19 2010 -0500 @@ -53,6 +53,16 @@ just include those 'expressions coded in individual datasets' into the overall graph. +OD replies to James: What I had in mind is you would be forced to compile your +own function inside the perform() method of an Op. This seemed like a +potential problem to me because it would prevent Theano from seeing the whole +fine-grained graph and do optimizations across multiple dataset +transformations (there may also be additional overhead from calling multiple +function). But if you are saying it is possible to include 'expressions coded +in individual datasets' into the overall graph, then I guess this point is +moot. Would this be achieved with an optimization that replaces the dataset +node with its internal graph? + Razvan comments: 1) Having Theano expressions inside the perform of a Theano Op can lead to issues. I know I had to deal with a few when implementing Scan which does exactly this. Well to be fair these issues mostly come into @@ -69,7 +79,14 @@ indices, the dataset class can reload parts of the data into the shared variable and so on. - +OD replies to Razvan's point 2: I think what you are saying is another concern +I had, which was the fact it may be confusing to mix in the same class the +Variable/Op and DataSet interfaces. I would indeed prefer to keep them +separate. However, it may be possible to come up with a system that would get +the best of both worlds (maybe by having the Op/Variable as members of +Dataset, and just asking the user building a theano graph to use these instead +of the dataset directly). Note that I'm mixing up Op/Variable here, because +it's just not clear yet for me which would go where... One issue with this approach is illustrated by the following example. Imagine we want to iterate on samples in a dataset and do something with their @@ -143,6 +160,11 @@ certain nodes of the graph to reduce the number of compilation while in approach (2) we don't need to deal with the complexity of lazy compilation + +OD comments: Well, to be fair, it means we put the burden of dealing with the +complexity of lazy compilation on the user (it's up to him to make sure he +compiles only one function). + - approach (1) needs a replace function if you want to change the dataset. What you would do, is once you have a "computational graph" or pipeline or whatever you call it, say ``graph``, to change the input you would do @@ -174,6 +196,14 @@ for datapoint in new_graph: do_something_with(datapoint()) + +OD comments: I don't really understand what is 'graph' in this code (it +appears in both approaches but is used differently). What I have in mind would +be more with 'graph' removed in the first approach you describe (#2), and +graph / new_graph replaced by dataset / new_dataset in the second one (#1). +You wouldn't need to call some graph.replace method: the graphs compiled for +iterating on 'dataset' and 'new_dataset' would be entirely separate (using two +different compiled functions, pretty much like #2). - in approach (1) the initial dataset object (the one that loads the data) decides if you will use shared variables and indices to deal with the @@ -225,7 +255,7 @@ hyper-parameters for which you need to recompile the thenao function and can not be just parameters ( so we would have yet another category ?). -Another syntactic option for iterating over datasets is +James: Another syntactic option for iterating over datasets is .. code-block:: python @@ -237,6 +267,12 @@ numeric_iterator function can also specify what compile mode to use, any givens you might want to apply, etc. +OD comments: Would there also be some kind of function cache to avoid +compiling the same function again if we re-iterate on the same dataset with +the same arguments? Maybe a more generic issue is: would there be a way for +Theano to be more efficient when re-compiling the same function that was +already compiled in the same program? (note that I am assuming here it is not +efficient, but I may be wrong). What About Learners? -------------------- @@ -251,6 +287,15 @@ the constructor of the learner? That seems much more flexible, compact, and clear than the decorator. +OD replies: Not sure I understand your idea here. We probably want a learner +to be able to compute its output on multiple datasets, without having to point +to these datasets within the learner itself (which seems cumbersome to me). +The point of the decorators is mostly to turn a single function (that outputs +a theano variable for the ouptut computed on a single sample) into a function +that can compute symbolic datasets as well as numeric sample outputs. Those +could also be instead different functions in the base Learner class if the +decorator approach is considered ugly / confusing. + A Learner may be able to compute various things. For instance, a Neural Network may output a ``prediction`` vector (whose elements correspond to estimated probabilities of each class in a classification task), as well as a @@ -330,3 +375,5 @@ Is this close to what you are suggesting? +OD: Yes, you guessed right, the decorator's role is to do something different +depending on the input to the function (see my reply to James above).