Mercurial > pylearn
comparison doc/v2_planning/plugin_JB.py @ 1200:acfd5e747a75
v2planning - a few changes to plugin proposals
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Mon, 20 Sep 2010 11:28:23 -0400 |
parents | 98954d8cb92d |
children | 865936d8221b |
comparison
equal
deleted
inserted
replaced
1199:98954d8cb92d | 1200:acfd5e747a75 |
---|---|
1 """plugin_JB - draft of library architecture using iterators""" | 1 """plugin_JB - draft of potential library architecture using iterators |
2 | |
3 This strategy makes use of a simple imperative language whose statements are python function | |
4 calls to create learning algorithms that can be manipulated and executed in several desirable | |
5 ways. | |
6 | |
7 The training procedure for a PCA module is easy to express: | |
8 | |
9 # allocate the relevant modules | |
10 dataset = Dataset(numpy.random.RandomState(123).randn(13,1)) | |
11 pca = PCA_Analysis() | |
12 pca_batchsize=1000 | |
13 | |
14 # define the control-flow of the algorithm | |
15 train_pca = SEQ([ | |
16 BUFFER_REPEAT(pca_batchsize, CALL(dataset.next)), | |
17 FILT(pca.analyze)]) | |
18 | |
19 # run the program | |
20 VirtualMachine(train_pca).run() | |
21 | |
22 The CALL, SEQ, FILT, and BUFFER_REPEAT are control-flow elements. The control-flow elements I | |
23 defined so far are: | |
24 | |
25 - CALL - a basic statement, just calls a python function | |
26 - FILT - like call, but passes the return value of the last CALL or FILT to the python function | |
27 - SEQ - a sequence of elements to run in order | |
28 - REPEAT - do something N times (and return None or maybe the last CALL?) | |
29 - BUFFER_REPEAT - do something N times and accumulate the return value from each iter | |
30 - LOOP - do something an infinite number of times | |
31 - CHOOSE - like a switch statement (should rename to SWITCH) | |
32 - WEAVE - interleave execution of multiple control-flow elements | |
33 | |
34 | |
35 We don't have many requirements per-se for the architecture, but I think this design respects | |
36 and realizes all of them. | |
37 The advantages of this approach are: | |
38 | |
39 - algorithms (including partially run ones) are COPYABLE, and SERIALIZABLE | |
40 | |
41 - algorithms can be executed without seizing control of the python process (the VM is an | |
42 iterator) so your main loop (aka alternate VM implementation) can be checking for network | |
43 or filesystem events related to job management | |
44 | |
45 - the library can provide learning algorithms via control-flow templates, and the user can | |
46 edit them (with search/replace calls) to include HOOKS, and DIAGNOSTIC plug-in | |
47 functionality | |
48 | |
49 e.g. prog.find(CALL(cd1_update, layer=layer1)).replace_with( | |
50 SEQ([CALL(cd1_update, layer=layer1), CALL(my_debugfn)])) | |
51 | |
52 - user can print the 'program code' of an algorithm built from library pieces | |
53 | |
54 - program can be optimized automatically. | |
55 | |
56 - e.g. BUFFER(N, CALL(dataset.next)) could be replaced if dataset.next implements the | |
57 right attribute/protocol for 'bufferable' or something. | |
58 | |
59 - e.g. SEQ([a,b,c,d]) could be compiled to a single CALL to a Theano-compiled function | |
60 if a, b, c, and d are calls to callable objects that export something like a | |
61 'theano_SEQ' interface | |
2 | 62 |
3 | 63 |
4 """ | 64 """ |
5 | 65 |
6 - PICKLABLE - algorithms are serializable at all points during execution | 66 __license__ = 'TODO' |
7 | 67 __copyright__ = 'TODO' |
8 - ITERATOR walks through algorithms with fine granularity | |
9 | |
10 - COMPONENTS - library provides components on which programs operate | |
11 | |
12 - ALGORITHMS - library provides algorithms in clean (no hooks) form | |
13 | |
14 - HOOKS - user can insert print / debug logic with search/replace type calls | |
15 e.g. prog.find(CALL(cd1_update)).replace_with(SEQ([CALL(cd1_update), CALL(debugfn)])) | |
16 | |
17 - PRINTING - user can print the 'program code' of an algorithm built from library pieces | |
18 | |
19 - MODULAR EXPERIMENTS - an experiment object with one (or more?) programs and all of the objects referred to by | |
20 those programs. It is the preferred type of object to be serialized. The main components of | |
21 the algorithms should be top-level attributes of the package. This object can be serialized | |
22 and loaded in another process to implement job migration. | |
23 | |
24 - OPTIMIZATION - program can be optimized automatically | |
25 e.g. BUFFER(N, CALL(dataset.next)) can be replaced if dataset.next implements the right | |
26 attribute/protocol for 'bufferable' or something. | |
27 | |
28 e.g. SEQ([a,b,c,d]) can be compiled with Theano if sub-sequence is compatible | |
29 | |
30 - don't need greenlets to get efficiency, the implementations of control flow ops can manage a | |
31 stack or stack tree in the vm (like greenlets do I think) we don't really need | |
32 greenlets/stackless I don't think | |
33 | |
34 """ | |
35 | |
36 __license__ = None | |
37 __copyright__ = None | |
38 | 68 |
39 import copy, sys, cPickle | 69 import copy, sys, cPickle |
40 | |
41 import numpy | 70 import numpy |
42 | |
43 | 71 |
44 ################################################### | 72 ################################################### |
45 # Virtual Machine for executing programs | 73 # Virtual Machine for executing programs |
46 | 74 |
47 class VirtualMachine(object): | 75 class VirtualMachine(object): |