Mercurial > pylearn
comparison doc/v2_planning/existing_python_ml_libraries.txt @ 1207:53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Tue, 21 Sep 2010 10:58:14 -0400 |
parents | 0e12ea6ba661 |
children | e5b7a7913329 |
comparison
equal
deleted
inserted
replaced
1206:203569655069 | 1207:53937045f6c7 |
---|---|
21 * libsvm python bindings Ian (but could trade) | 21 * libsvm python bindings Ian (but could trade) |
22 * scikits.learn Guillaume (but could trade) | 22 * scikits.learn Guillaume (but could trade) |
23 | 23 |
24 Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443 | 24 Also check out http://scipy.org/Topical_Software#head-fc5493250d285f5c634e51be7ba0f80d5f4d6443 |
25 - scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning | 25 - scipy.org's ``topical software'' section on Artificial Intelligence and Machine Learning |
26 | |
27 | |
28 Email sent by IG to lisa_labo | |
29 ----------------------------- | |
30 | |
31 The Existing Libraries committee has finished meeting. We have three | |
32 sets of recommendations: | |
33 1. Recommendations for designing pylearn based on features we like | |
34 from other libraries | |
35 2. Recommendations for distributing pylearn with other libraries | |
36 3. Recommendations for implementations to wrap | |
37 | |
38 1. Features we liked from other libraries include: | |
39 -Most major libraries such as MDP, PyML, scikit.learn, and pybrain | |
40 offer some way of making a DAG that specifies a feedforward | |
41 architecture (Monte Python does this and allows backprop as well). We | |
42 will probably have a similar structure but with more features on top | |
43 of it, such as joint training. One nice feature of MDP is the ability | |
44 to visualize this structure in an HTML document. | |
45 -Dataset abstractions handled by transformer nodes: Rather than | |
46 defining several "views" or "casts" of datasets, most of these | |
47 libraries (particularly mdp and scikit.learn) allow you to put in | |
48 whatever kind of data you want, and then have your processing nodes in | |
49 your DAG format the data correctly for the later parts of the DAG. | |
50 This makes it easy to use several transformed versions of the dataset | |
51 (like chopping images up into small patches) without pylearn having to | |
52 include functionality for all of these possible transformations. | |
53 -mdp and scikit.learn both have a clearly defined inheritance | |
54 structure, with a small number of root-level superclasses exposing | |
55 most of the functionality of the library through their method | |
56 signatures. | |
57 -checkpoints: mdp allows the user to specify arbitrary callbacks to | |
58 run at various points during training or processing. This is mainly | |
59 designed for the user to be able to save state for crash recovery | |
60 purposes, but could have other uses like visualizing the evolution of | |
61 the weights over time. | |
62 -mdp includes an interface for learners to declare that they can learn | |
63 in parallel, ie the same object can look at different data on | |
64 different cpu cores. This is not useful for sgd-based models but could | |
65 be nice for pca/sfa type models (which is most of what mdp | |
66 implements). | |
67 -Monte Python has humorously named classes, such as the 'gym', which | |
68 is the package that contains all of the 'trainers' | |
69 -pyml has support for sparse datasets | |
70 -pyml has an 'aggregate dataset' that can combine other datasets | |
71 | |
72 | |
73 2. Recommendations for distributing pylearn | |
74 | |
75 pylearn appears to be the most ambitious of all existing python | |
76 machine learning projects. There is no established machine learning | |
77 library whose scope is broad enough for us to contribute to that | |
78 library rather than developing our own. | |
79 | |
80 Some libraries are frequently cited in the literature and | |
81 well-respected. One example is libsvm. We should wrap these libraries | |
82 so that pylearn users can run experiments with the most | |
83 well-established and credible implementations possible. | |
84 | |
85 Wrapping 3rd party libraries may present some issues with licensing. | |
86 We expect to release pylearn under the BSD license (so that business | |
87 partners such as Ubisoft can use it in shipped products), but much of | |
88 the code we want to wrap may be released under the GPL or some other | |
89 license that prevents inclusion in a BSD project. We therefore propose | |
90 to keep only core functionality in pylearn itself, and put most | |
91 implementation of actual algorithms into separate packages. One | |
92 package could provide a set of BSD licensed plugins developed by us or | |
93 based on wrapping BSD licensed 3rd party libraries, and another | |
94 package could provide a set of GPL licensed plugins developed by | |
95 wrapping GPL'ed code. | |
96 | |
97 3. Recommendations for implementations to wrap | |
98 | |
99 shogun: | |
100 large scale kernel learning (mostly svms). this wraps other | |
101 libraries we should definitely be interested in, such as libsvm | |
102 (because it is well-established) and others that get state of the art | |
103 performance or are good for extremely large datasets, etc. | |
104 milk: | |
105 k-means | |
106 svm's with arbitrary python types for kernel arguments | |
107 pybrain: | |
108 lstm | |
109 mlpy: | |
110 feature selection | |
111 mdp: | |
112 ica | |
113 LLE | |
114 scikit.learn: | |
115 lasso | |
116 nearest neighbor | |
117 isomap | |
118 various metrics | |
119 mean shift | |
120 cross validation | |
121 LDA | |
122 HMMs | |
123 Yet Another Python Graph Library: | |
124 graph similarity functions that could be useful if we want to | |
125 learn with graphs as data | |
126 |