annotate doc/v2_planning/existing_python_ml_libraries.txt @ 1419:cff305ad9f60

TensorFnDataset - added x_ attribute that caches the dataset function return value, but does not get pickled.
author James Bergstra <bergstrj@iro.umontreal.ca>
date Fri, 04 Feb 2011 16:05:22 -0500
parents f5e9c00a67d7
children
rev   line source
1008
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
1 Committee members: GD, DWF, IG, DE
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
2
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
3 This committee will investigate the possibility of interfacing and/or
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
4 borrowing from other Python machine learning libraries that exist out there.
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
5 Some questions that we need to answer:
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
6
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
7 * How much should we try to interface with other libraries?
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
8 * What parts can we and should we implement ourselves and what should we leave
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1042
diff changeset
9 to the other libraries?
1008
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
10
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
11 Preliminary list of libraries to look at:
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
12
1039
730c00950957 signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents: 1035
diff changeset
13 * Pybrain Razvan
730c00950957 signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents: 1035
diff changeset
14 * MDP Ian
730c00950957 signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents: 1035
diff changeset
15 * Orange (http://www.ailab.si/orange/) Ian (but could trade)
1034
564c069134c2 added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1021
diff changeset
16 * PyML (http://pyml.sourceforge.net/)
1042
4eaf576c3e9a dumi's choices
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1040
diff changeset
17 * mlpy (https://mlpy.fbk.eu/) Dumitru
4eaf576c3e9a dumi's choices
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1040
diff changeset
18 * APGL (http://packages.python.org/apgl/) Dumitru
1040
875d53754bd0 Picked MontePython as second ML library to look into
gdesjardins
parents: 1039
diff changeset
19 * MontePython (http://montepython.sourceforge.net/) Guillaume (but could trade)
1008
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
20 * Shogun python bindings
1039
730c00950957 signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents: 1035
diff changeset
21 * libsvm python bindings Ian (but could trade)
730c00950957 signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents: 1035
diff changeset
22 * scikits.learn Guillaume (but could trade)
1034
564c069134c2 added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1021
diff changeset
23
564c069134c2 added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1021
diff changeset
24 Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1042
diff changeset
25 - scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning
1207
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
26
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
27
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
28 Email sent by IG to lisa_labo
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
29 -----------------------------
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
30
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
31 The Existing Libraries committee has finished meeting. We have three
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
32 sets of recommendations:
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
33 1. Recommendations for designing pylearn based on features we like
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
34 from other libraries
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
35 2. Recommendations for distributing pylearn with other libraries
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
36 3. Recommendations for implementations to wrap
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
37
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
38 1. Features we liked from other libraries include:
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
39 -Most major libraries such as MDP, PyML, scikit.learn, and pybrain
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
40 offer some way of making a DAG that specifies a feedforward
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
41 architecture (Monte Python does this and allows backprop as well). We
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
42 will probably have a similar structure but with more features on top
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
43 of it, such as joint training. One nice feature of MDP is the ability
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
44 to visualize this structure in an HTML document.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
45 -Dataset abstractions handled by transformer nodes: Rather than
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
46 defining several "views" or "casts" of datasets, most of these
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
47 libraries (particularly mdp and scikit.learn) allow you to put in
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
48 whatever kind of data you want, and then have your processing nodes in
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
49 your DAG format the data correctly for the later parts of the DAG.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
50 This makes it easy to use several transformed versions of the dataset
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
51 (like chopping images up into small patches) without pylearn having to
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
52 include functionality for all of these possible transformations.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
53 -mdp and scikit.learn both have a clearly defined inheritance
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
54 structure, with a small number of root-level superclasses exposing
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
55 most of the functionality of the library through their method
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
56 signatures.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
57 -checkpoints: mdp allows the user to specify arbitrary callbacks to
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
58 run at various points during training or processing. This is mainly
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
59 designed for the user to be able to save state for crash recovery
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
60 purposes, but could have other uses like visualizing the evolution of
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
61 the weights over time.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
62 -mdp includes an interface for learners to declare that they can learn
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
63 in parallel, ie the same object can look at different data on
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
64 different cpu cores. This is not useful for sgd-based models but could
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
65 be nice for pca/sfa type models (which is most of what mdp
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
66 implements).
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
67 -Monte Python has humorously named classes, such as the 'gym', which
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
68 is the package that contains all of the 'trainers'
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
69 -pyml has support for sparse datasets
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
70 -pyml has an 'aggregate dataset' that can combine other datasets
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
71
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
72
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
73 2. Recommendations for distributing pylearn
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
74
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
75 pylearn appears to be the most ambitious of all existing python
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
76 machine learning projects. There is no established machine learning
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
77 library whose scope is broad enough for us to contribute to that
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
78 library rather than developing our own.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
79
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
80 Some libraries are frequently cited in the literature and
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
81 well-respected. One example is libsvm. We should wrap these libraries
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
82 so that pylearn users can run experiments with the most
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
83 well-established and credible implementations possible.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
84
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
85 Wrapping 3rd party libraries may present some issues with licensing.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
86 We expect to release pylearn under the BSD license (so that business
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
87 partners such as Ubisoft can use it in shipped products), but much of
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
88 the code we want to wrap may be released under the GPL or some other
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
89 license that prevents inclusion in a BSD project. We therefore propose
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
90 to keep only core functionality in pylearn itself, and put most
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
91 implementation of actual algorithms into separate packages. One
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
92 package could provide a set of BSD licensed plugins developed by us or
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
93 based on wrapping BSD licensed 3rd party libraries, and another
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
94 package could provide a set of GPL licensed plugins developed by
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
95 wrapping GPL'ed code.
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
96
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
97 3. Recommendations for implementations to wrap
53937045f6c7 Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents: 1189
diff changeset
98
1310
f5e9c00a67d7 fix rst problem.
Frederic Bastien <nouiz@nouiz.org>
parents: 1309
diff changeset
99 * shogun:
f5e9c00a67d7 fix rst problem.
Frederic Bastien <nouiz@nouiz.org>
parents: 1309
diff changeset
100 * large scale kernel learning (mostly svms). this wraps other
f5e9c00a67d7 fix rst problem.
Frederic Bastien <nouiz@nouiz.org>
parents: 1309
diff changeset
101 libraries we should definitely be interested in, such as libsvm
f5e9c00a67d7 fix rst problem.
Frederic Bastien <nouiz@nouiz.org>
parents: 1309
diff changeset
102 (because it is well-established) and others that get state of the art
f5e9c00a67d7 fix rst problem.
Frederic Bastien <nouiz@nouiz.org>
parents: 1309
diff changeset
103 performance or are good for extremely large datasets, etc.
1309
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
104 * milk:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
105 * k-means
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
106 * svm's with arbitrary python types for kernel arguments
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
107 * pybrain:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
108 * lstm
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
109 * mlpy:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
110 * feature selection
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
111 * mdp:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
112 * ica
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
113 * LLE
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
114 * scikit.learn:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
115 * lasso
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
116 * nearest neighbor
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
117 * isomap
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
118 * various metrics
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
119 * mean shift
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
120 * cross validation
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
121 * LDA
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
122 * HMMs
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
123 * Yet Another Python Graph Library:
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
124 * graph similarity functions that could be useful if we want to
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
125 learn with graphs as data
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1207
diff changeset
126