pylearn: doc/v2_planning/existing_python_ml

comparison doc/v2_planning/existing_python_ml_libraries.txt @ 1207:53937045f6c7

Pasted content of email sent by Ian about existing python ML libraries

author	Olivier Delalleau <delallea@iro>
date	Tue, 21 Sep 2010 10:58:14 -0400
parents	0e12ea6ba661
children	e5b7a7913329

comparison

equal deleted inserted replaced

-:203569655069
+:53937045f6c7
 * libsvm python bindings   Ian (but could trade)
 * scikits.learn Guillaume (but could trade)
 Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443
 - scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning
+Email sent by IG to lisa_labo
+-----------------------------
+The Existing Libraries committee has finished meeting. We have three
+sets of recommendations:
+1. Recommendations for designing pylearn based on features we like
+from other libraries
+2. Recommendations for distributing pylearn with other libraries
+3. Recommendations for implementations to wrap
+1. Features we liked from other libraries include:
+-Most major libraries such as MDP, PyML, scikit.learn, and pybrain
+offer some way of making a DAG that specifies a feedforward
+architecture (Monte Python does this and allows backprop as well). We
+will probably have a similar structure but with more features on top
+of it, such as joint training. One nice feature of MDP is the ability
+to visualize this structure in an HTML document.
+-Dataset abstractions handled by transformer nodes: Rather than
+defining several "views" or "casts" of datasets, most of these
+libraries (particularly mdp and scikit.learn) allow you to put in
+whatever kind of data you want, and then have your processing nodes in
+your DAG format the data correctly for the later parts of the DAG.
+This makes it easy to use several transformed versions of the dataset
+(like chopping images up into small patches) without pylearn having to
+include functionality for all of these possible transformations.
+-mdp and scikit.learn both have a clearly defined inheritance
+structure, with a small number of root-level superclasses exposing
+most of the functionality of the library through their method
+signatures.
+-checkpoints: mdp allows the user to specify arbitrary callbacks to
+run at various points during training or processing. This is mainly
+designed for the user to be able to save state for crash recovery
+purposes, but could have other uses like visualizing the evolution of
+the weights over time.
+-mdp includes an interface for learners to declare that they can learn
+in parallel, ie the same object can look at different data on
+different cpu cores. This is not useful for sgd-based models but could
+be nice for pca/sfa type models (which is most of what mdp
+implements).
+-Monte Python has humorously named classes, such as the 'gym', which
+is the package that contains all of the 'trainers'
+-pyml has support for sparse datasets
+-pyml has an 'aggregate dataset' that can combine other datasets
+2. Recommendations for distributing pylearn
+pylearn appears to be the most ambitious of all existing python
+machine learning projects. There is no established machine learning
+library whose scope is broad enough for us to contribute to that
+library rather than developing our own.
+Some libraries are frequently cited in the literature and
+well-respected. One example is libsvm. We should wrap these libraries
+so that pylearn users can run experiments with the most
+well-established and credible implementations possible.
+Wrapping 3rd party libraries may present some issues with licensing.
+We expect to release pylearn under the BSD license (so that business
+partners such as Ubisoft can use it in shipped products), but much of
+the code we want to wrap may be released under the GPL or some other
+license that prevents inclusion in a BSD project. We therefore propose
+to keep only core functionality in pylearn itself, and put most
+implementation of actual algorithms into separate packages. One
+package could provide a set of BSD licensed plugins developed by us or
+based on wrapping BSD licensed 3rd party libraries, and another
+package could provide a set of GPL licensed plugins developed by
+wrapping GPL'ed code.
+3. Recommendations for implementations to wrap
+shogun:
+large scale kernel learning (mostly svms). this wraps other
+libraries we should definitely be interested in, such as libsvm
+(because it is well-established) and others that get state of the art
+performance or are good for extremely large datasets, etc.
+milk:
+k-means
+svm's with arbitrary python types for kernel arguments
+pybrain:
+lstm
+mlpy:
+feature selection
+mdp:
+ica
+LLE
+scikit.learn:
+lasso
+nearest neighbor
+isomap
+various metrics
+mean shift
+cross validation
+LDA
+HMMs
+Yet Another Python Graph Library:
+graph similarity functions that could be useful if we want to
+learn with graphs as data

Mercurial > pylearn

comparison doc/v2_planning/existing_python_ml_libraries.txt @ 1207:53937045f6c7