Mercurial > pylearn
annotate doc/v2_planning/existing_python_ml_libraries.txt @ 1396:310e22d7e44b
new file about datalearn in pytables.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Mon, 10 Jan 2011 14:55:39 -0500 |
parents | f5e9c00a67d7 |
children |
rev | line source |
---|---|
1008
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
1 Committee members: GD, DWF, IG, DE |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
2 |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
3 This committee will investigate the possibility of interfacing and/or |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
4 borrowing from other Python machine learning libraries that exist out there. |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
5 Some questions that we need to answer: |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
6 |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
7 * How much should we try to interface with other libraries? |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
8 * What parts can we and should we implement ourselves and what should we leave |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1042
diff
changeset
|
9 to the other libraries? |
1008
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
10 |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
11 Preliminary list of libraries to look at: |
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
12 |
1039
730c00950957
signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents:
1035
diff
changeset
|
13 * Pybrain Razvan |
730c00950957
signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents:
1035
diff
changeset
|
14 * MDP Ian |
730c00950957
signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents:
1035
diff
changeset
|
15 * Orange (http://www.ailab.si/orange/) Ian (but could trade) |
1034
564c069134c2
added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
1021
diff
changeset
|
16 * PyML (http://pyml.sourceforge.net/) |
1042 | 17 * mlpy (https://mlpy.fbk.eu/) Dumitru |
18 * APGL (http://packages.python.org/apgl/) Dumitru | |
1040
875d53754bd0
Picked MontePython as second ML library to look into
gdesjardins
parents:
1039
diff
changeset
|
19 * MontePython (http://montepython.sourceforge.net/) Guillaume (but could trade) |
1008
a5886b394bda
Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
20 * Shogun python bindings |
1039
730c00950957
signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents:
1035
diff
changeset
|
21 * libsvm python bindings Ian (but could trade) |
730c00950957
signed myself, Razvan, and Guillaume up for some existing libraries to review
Ian Goodfellow
parents:
1035
diff
changeset
|
22 * scikits.learn Guillaume (but could trade) |
1034
564c069134c2
added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
1021
diff
changeset
|
23 |
564c069134c2
added more software & links
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
1021
diff
changeset
|
24 Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1042
diff
changeset
|
25 - scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning |
1207
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
26 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
27 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
28 Email sent by IG to lisa_labo |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
29 ----------------------------- |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
30 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
31 The Existing Libraries committee has finished meeting. We have three |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
32 sets of recommendations: |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
33 1. Recommendations for designing pylearn based on features we like |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
34 from other libraries |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
35 2. Recommendations for distributing pylearn with other libraries |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
36 3. Recommendations for implementations to wrap |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
37 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
38 1. Features we liked from other libraries include: |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
39 -Most major libraries such as MDP, PyML, scikit.learn, and pybrain |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
40 offer some way of making a DAG that specifies a feedforward |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
41 architecture (Monte Python does this and allows backprop as well). We |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
42 will probably have a similar structure but with more features on top |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
43 of it, such as joint training. One nice feature of MDP is the ability |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
44 to visualize this structure in an HTML document. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
45 -Dataset abstractions handled by transformer nodes: Rather than |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
46 defining several "views" or "casts" of datasets, most of these |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
47 libraries (particularly mdp and scikit.learn) allow you to put in |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
48 whatever kind of data you want, and then have your processing nodes in |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
49 your DAG format the data correctly for the later parts of the DAG. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
50 This makes it easy to use several transformed versions of the dataset |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
51 (like chopping images up into small patches) without pylearn having to |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
52 include functionality for all of these possible transformations. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
53 -mdp and scikit.learn both have a clearly defined inheritance |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
54 structure, with a small number of root-level superclasses exposing |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
55 most of the functionality of the library through their method |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
56 signatures. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
57 -checkpoints: mdp allows the user to specify arbitrary callbacks to |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
58 run at various points during training or processing. This is mainly |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
59 designed for the user to be able to save state for crash recovery |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
60 purposes, but could have other uses like visualizing the evolution of |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
61 the weights over time. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
62 -mdp includes an interface for learners to declare that they can learn |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
63 in parallel, ie the same object can look at different data on |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
64 different cpu cores. This is not useful for sgd-based models but could |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
65 be nice for pca/sfa type models (which is most of what mdp |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
66 implements). |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
67 -Monte Python has humorously named classes, such as the 'gym', which |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
68 is the package that contains all of the 'trainers' |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
69 -pyml has support for sparse datasets |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
70 -pyml has an 'aggregate dataset' that can combine other datasets |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
71 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
72 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
73 2. Recommendations for distributing pylearn |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
74 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
75 pylearn appears to be the most ambitious of all existing python |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
76 machine learning projects. There is no established machine learning |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
77 library whose scope is broad enough for us to contribute to that |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
78 library rather than developing our own. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
79 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
80 Some libraries are frequently cited in the literature and |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
81 well-respected. One example is libsvm. We should wrap these libraries |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
82 so that pylearn users can run experiments with the most |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
83 well-established and credible implementations possible. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
84 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
85 Wrapping 3rd party libraries may present some issues with licensing. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
86 We expect to release pylearn under the BSD license (so that business |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
87 partners such as Ubisoft can use it in shipped products), but much of |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
88 the code we want to wrap may be released under the GPL or some other |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
89 license that prevents inclusion in a BSD project. We therefore propose |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
90 to keep only core functionality in pylearn itself, and put most |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
91 implementation of actual algorithms into separate packages. One |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
92 package could provide a set of BSD licensed plugins developed by us or |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
93 based on wrapping BSD licensed 3rd party libraries, and another |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
94 package could provide a set of GPL licensed plugins developed by |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
95 wrapping GPL'ed code. |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
96 |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
97 3. Recommendations for implementations to wrap |
53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
Olivier Delalleau <delallea@iro>
parents:
1189
diff
changeset
|
98 |
1310 | 99 * shogun: |
100 * large scale kernel learning (mostly svms). this wraps other | |
101 libraries we should definitely be interested in, such as libsvm | |
102 (because it is well-established) and others that get state of the art | |
103 performance or are good for extremely large datasets, etc. | |
1309 | 104 * milk: |
105 * k-means | |
106 * svm's with arbitrary python types for kernel arguments | |
107 * pybrain: | |
108 * lstm | |
109 * mlpy: | |
110 * feature selection | |
111 * mdp: | |
112 * ica | |
113 * LLE | |
114 * scikit.learn: | |
115 * lasso | |
116 * nearest neighbor | |
117 * isomap | |
118 * various metrics | |
119 * mean shift | |
120 * cross validation | |
121 * LDA | |
122 * HMMs | |
123 * Yet Another Python Graph Library: | |
124 * graph similarity functions that could be useful if we want to | |
125 learn with graphs as data | |
126 |