# HG changeset patch # User Frederic Bastien # Date 1284771318 14400 # Node ID 0e12ea6ba6612903bb646baf2eb67f3df93afe7a # Parent 073c2fab7bcd5676f74df1420d334bbc992ae53a fix many rst syntax error warning. diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/existing_python_ml_libraries.txt --- a/doc/v2_planning/existing_python_ml_libraries.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/existing_python_ml_libraries.txt Fri Sep 17 20:55:18 2010 -0400 @@ -6,7 +6,7 @@ * How much should we try to interface with other libraries? * What parts can we and should we implement ourselves and what should we leave - to the other libraries? + to the other libraries? Preliminary list of libraries to look at: @@ -22,5 +22,4 @@ * scikits.learn Guillaume (but could trade) Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443 -- scipy.org's ``topical software'' section on Artificial Intelligence and - Machine Learning +- scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/learner.txt --- a/doc/v2_planning/learner.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/learner.txt Fri Sep 17 20:55:18 2010 -0400 @@ -9,33 +9,34 @@ following semantics: * A learner has named hyper-parameters that control how it learns (these can be viewed -as options of the constructor, or might be set directly by a user) + as options of the constructor, or might be set directly by a user) * A learner also has an internal state that depends on what it has learned. * A learner reads and produces data, so the definition of learner is -intimately linked to the definition of dataset (and task). + intimately linked to the definition of dataset (and task). * A learner has one or more 'train' or 'adapt' functions by which -it is given a sample of data (typically either the whole training set, or -a mini-batch, which contains as a special case a single 'example'). Learners -interface with datasets in order to obtain data. These functions cause the -learner to change its internal state and take advantage to some extent -of the data provided. The 'train' function should take charge of -completely exploiting the dataset, as specified per the hyper-parameters, -so that it would typically be called only once. An 'adapt' function -is meant for learners that can operate in an 'online' setting where -data continually arrive and the control loop (when to stop) is to -be managed outside of it. For most intents and purposes, the -'train' function could also handle the 'online' case by providing -the controlled iterations over the dataset (which would then be -seen as a stream of examples). + it is given a sample of data (typically either the whole training set, or + a mini-batch, which contains as a special case a single 'example'). Learners + interface with datasets in order to obtain data. These functions cause the + learner to change its internal state and take advantage to some extent + of the data provided. The 'train' function should take charge of + completely exploiting the dataset, as specified per the hyper-parameters, + so that it would typically be called only once. An 'adapt' function + is meant for learners that can operate in an 'online' setting where + data continually arrive and the control loop (when to stop) is to + be managed outside of it. For most intents and purposes, the + 'train' function could also handle the 'online' case by providing + the controlled iterations over the dataset (which would then be + seen as a stream of examples). + * learner.train(dataset) * learner.adapt(data) * Different types of learners can then exploit their internal state -in order to perform various computations after training is completed, -or in the middle of training, e.g., + in order to perform various computations after training is completed, + or in the middle of training, e.g., * y=learner.predict(x) for learners that see (x,y) pairs during training and predict y given x, @@ -67,15 +68,18 @@ * [prediction,costs] = learner.predict_and_adapt((x,y)) * Some learners could include in their internal state not only what they -have learned but some information about recently seen examples that conditions -the expected distribution of upcoming examples. In that case, they might -be used, e.g. in an online setting as follows: + have learned but some information about recently seen examples that conditions + the expected distribution of upcoming examples. In that case, they might + be used, e.g. in an online setting as follows: + +.. code-block:: python + for (x,y) in data_stream: [prediction,costs]=learner.predict((x,y)) accumulate_statistics(prediction,costs) * In some cases, each example is itself a (possibly variable-size) sequence -or other variable-size object (e.g. an image, or a video) + or other variable-size object (e.g. an image, or a video) @@ -187,6 +191,8 @@ An object that allows us to explore the graph discussed above. Specifically, it represents an explored node in that graph. +.. code-block:: python + def active_instructions() """ Return a list/set of Instruction instances (see below) that the Learner is prepared to handle. @@ -207,6 +213,8 @@ An object that represents a potential edge in the graph discussed above. It is an operation that a learner can perform. +.. code-block:: python + arg_types """a list of Type object (see below) indicating what args are required by execute""" @@ -228,6 +236,8 @@ It is not necessary that a Type specifies exactly which arguments are legal, but it should `include` all legal arguments, and exclude as many illegal ones as possible. +.. code-block:: python + def includes(value): """return True if value is a legal argument""" @@ -318,17 +328,23 @@ I'm wondering what's the benefit of such an API compared to simply defining a new method for each instruction. It seems to me that typically, the 'execute' method would end up being something like + +.. code-block:: python + if instruction == 'do_x': self.do_x(..) elif instruction == 'do_y': self.do_y(..) ... + so why not directly call do_x / do_y instead? JB replies: I agree with you, and in the implementation of a Learner I suggest using Python decorators to get the best of both worlds: +.. code-block:: python + class NNet(Learner): ... @@ -429,15 +445,16 @@ I think having a DAG is useful in many ways (all this are things that one might think about implementing in a far future, I'm not proposing to implement them unless we want to use them - like the reconstruction ): + * there exist the posibility of writing optimizations ( theano style ) * there exist the posibility to add global view utility functions ( like - a reconstruction function for SdA - extremely low level here), or global - view diagnostic tools + a reconstruction function for SdA - extremely low level here), or global + view diagnostic tools * the posibility of creating a GUI ( where you just create the Graph by - picking transforms and variables from a list ) or working interactively - and then generating code that will reproduce the graph + picking transforms and variables from a list ) or working interactively + and then generating code that will reproduce the graph * you can view the graph and different granularity levels to understand - things ( global diagnostics) + things ( global diagnostics) We should have a taxonomy of possible classes of functions and possible classes of variables, but those should not be exclusive. We can work at a high diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/main_plan.txt --- a/doc/v2_planning/main_plan.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/main_plan.txt Fri Sep 17 20:55:18 2010 -0400 @@ -3,7 +3,7 @@ ========== Yoshua (points discussed Thursday Sept 2, 2010 at LISA tea-talk) ------- +---------------------------------------------------------------- ****** Why we need to get better organized in our code-writing ****** @@ -151,6 +151,7 @@ Another thing to consider related to datasets is that there are a number of other efforts to have standard ML datasets, and we should be aware of them, and compatible with them when it's easy: + - mldata.org (they have a file format, not sure how many use it) - weka (ARFF file format) - scikits.learn @@ -168,10 +169,10 @@ Yoshua (about ideas proposed by Pascal Vincent a while ago): - we may want to distinguish between datasets and tasks: a task defines - not just the data but also things like what is the input and what is the - target (for supervised learning), and *importantly* a set of performance metrics - that make sense for this task (e.g. those used by papers solving a particular - task, or reported for a particular benchmark) + not just the data but also things like what is the input and what is the + target (for supervised learning), and *importantly* a set of performance metrics + that make sense for this task (e.g. those used by papers solving a particular + task, or reported for a particular benchmark) - we should discuss about a few "standards" that datasets and tasks may comply to, such as - "input" and "target" fields inside each example, for supervised or semi-supervised learning tasks diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/neural_net.txt --- a/doc/v2_planning/neural_net.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/neural_net.txt Fri Sep 17 20:55:18 2010 -0400 @@ -11,7 +11,7 @@ Objective ( Razvan) ---------- +------------------- Come up with a description of how to write learners ( how to combine optimizer, structure, error measure, how to talk to datasets, tasks ( if there diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/optimization.txt --- a/doc/v2_planning/optimization.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/optimization.txt Fri Sep 17 20:55:18 2010 -0400 @@ -64,6 +64,7 @@ matter if we are just wrapping a theano-based algorithm (that already has to handle multiple parameters), and avoiding useless data copies on each call to f / df can only help speed-wise. + JB replies: Done, I added possibility that x0 is list of ndarrays to the api doc. @@ -86,6 +87,8 @@ OD: I wish we could get closer to each other the Theano and Numpy interfaces. It would be nice if we could do something like: +.. code-block:: python + # Theano version. updates = sgd([p], gradients=[g], stop=stop, step_size=.1) sgd_step = theano.function([input_var, target_var], [], updates=updates) @@ -101,6 +104,8 @@ where sgd would look something like: +.. code-block:: python + class sgd(...): def __init__(self, parameters, cost=None, gradients=None, stop=None, step_size=None): @@ -117,6 +122,8 @@ Then a wrapper to provide a scipy-like interface could be: +.. code-block:: python + def minimize(x0, f, df, algo, **kw): stop = numpy.array(0, dtype=numpy.int8) algo_step = eval(algo)([x0], cost=f, gradients=lambda x: (df(x), ), diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/sampler.txt --- a/doc/v2_planning/sampler.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/sampler.txt Fri Sep 17 20:55:18 2010 -0400 @@ -44,6 +44,6 @@ ================= * MCMC methods have a usage pattern that is quite different from the kind of univariate sampling methods -needed for nice-and-easy parametric families. + needed for nice-and-easy parametric families. diff -r 073c2fab7bcd -r 0e12ea6ba661 doc/v2_planning/use_cases.txt --- a/doc/v2_planning/use_cases.txt Fri Sep 17 20:24:30 2010 -0400 +++ b/doc/v2_planning/use_cases.txt Fri Sep 17 20:55:18 2010 -0400 @@ -56,8 +56,9 @@ There are many ways that the training could be configured, but here is one: +.. code-block:: python -vm.call( + vm.call( halflife_stopper( # OD: is n_hidden supposed to be n_classes instead? initial_model=random_linear_classifier(MNIST.n_inputs, MNIST.n_hidden, r_seed=234432), @@ -108,22 +109,25 @@ regularly had issues in PLearn with the fact we had for instance to give the number of inputs when creating a neural network. I much prefer when this kind of thing can be figured out at runtime: - - Any parameter you can get rid of is a significant gain in - user-friendliness. - - It's not always easy to know in advance e.g. the dimension of your input - dataset. Imagine for instance this dataset is obtained in a first step - by going through a PCA whose number of output dimensions is set so as to - keep 90% of the variance. - - It seems to me it fits better the idea of a symbolic graph: my intuition - (that may be very different from what you actually have in mind) is to - see an experiment as a symbolic graph, which you instantiate when you - provide the input data. One advantage of this point of view is it makes - it natural to re-use the same block components on various datasets / - splits, something we often want to do. + +- Any parameter you can get rid of is a significant gain in + user-friendliness. +- It's not always easy to know in advance e.g. the dimension of your input + dataset. Imagine for instance this dataset is obtained in a first step + by going through a PCA whose number of output dimensions is set so as to + keep 90% of the variance. +- It seems to me it fits better the idea of a symbolic graph: my intuition + (that may be very different from what you actually have in mind) is to + see an experiment as a symbolic graph, which you instantiate when you + provide the input data. One advantage of this point of view is it makes + it natural to re-use the same block components on various datasets / + splits, something we often want to do. K-fold cross validation of a classifier --------------------------------------- +.. code-block:: python + splits = kfold_cross_validate( # OD: What would these parameters mean? indexlist = range(1000)