comparison doc/v2_planning/plugin_JB.py @ 1200:acfd5e747a75

v2planning - a few changes to plugin proposals
author James Bergstra <bergstrj@iro.umontreal.ca>
date Mon, 20 Sep 2010 11:28:23 -0400
parents 98954d8cb92d
children 865936d8221b
comparison
equal deleted inserted replaced
1199:98954d8cb92d 1200:acfd5e747a75
1 """plugin_JB - draft of library architecture using iterators""" 1 """plugin_JB - draft of potential library architecture using iterators
2
3 This strategy makes use of a simple imperative language whose statements are python function
4 calls to create learning algorithms that can be manipulated and executed in several desirable
5 ways.
6
7 The training procedure for a PCA module is easy to express:
8
9 # allocate the relevant modules
10 dataset = Dataset(numpy.random.RandomState(123).randn(13,1))
11 pca = PCA_Analysis()
12 pca_batchsize=1000
13
14 # define the control-flow of the algorithm
15 train_pca = SEQ([
16 BUFFER_REPEAT(pca_batchsize, CALL(dataset.next)),
17 FILT(pca.analyze)])
18
19 # run the program
20 VirtualMachine(train_pca).run()
21
22 The CALL, SEQ, FILT, and BUFFER_REPEAT are control-flow elements. The control-flow elements I
23 defined so far are:
24
25 - CALL - a basic statement, just calls a python function
26 - FILT - like call, but passes the return value of the last CALL or FILT to the python function
27 - SEQ - a sequence of elements to run in order
28 - REPEAT - do something N times (and return None or maybe the last CALL?)
29 - BUFFER_REPEAT - do something N times and accumulate the return value from each iter
30 - LOOP - do something an infinite number of times
31 - CHOOSE - like a switch statement (should rename to SWITCH)
32 - WEAVE - interleave execution of multiple control-flow elements
33
34
35 We don't have many requirements per-se for the architecture, but I think this design respects
36 and realizes all of them.
37 The advantages of this approach are:
38
39 - algorithms (including partially run ones) are COPYABLE, and SERIALIZABLE
40
41 - algorithms can be executed without seizing control of the python process (the VM is an
42 iterator) so your main loop (aka alternate VM implementation) can be checking for network
43 or filesystem events related to job management
44
45 - the library can provide learning algorithms via control-flow templates, and the user can
46 edit them (with search/replace calls) to include HOOKS, and DIAGNOSTIC plug-in
47 functionality
48
49 e.g. prog.find(CALL(cd1_update, layer=layer1)).replace_with(
50 SEQ([CALL(cd1_update, layer=layer1), CALL(my_debugfn)]))
51
52 - user can print the 'program code' of an algorithm built from library pieces
53
54 - program can be optimized automatically.
55
56 - e.g. BUFFER(N, CALL(dataset.next)) could be replaced if dataset.next implements the
57 right attribute/protocol for 'bufferable' or something.
58
59 - e.g. SEQ([a,b,c,d]) could be compiled to a single CALL to a Theano-compiled function
60 if a, b, c, and d are calls to callable objects that export something like a
61 'theano_SEQ' interface
2 62
3 63
4 """ 64 """
5 65
6 - PICKLABLE - algorithms are serializable at all points during execution 66 __license__ = 'TODO'
7 67 __copyright__ = 'TODO'
8 - ITERATOR walks through algorithms with fine granularity
9
10 - COMPONENTS - library provides components on which programs operate
11
12 - ALGORITHMS - library provides algorithms in clean (no hooks) form
13
14 - HOOKS - user can insert print / debug logic with search/replace type calls
15 e.g. prog.find(CALL(cd1_update)).replace_with(SEQ([CALL(cd1_update), CALL(debugfn)]))
16
17 - PRINTING - user can print the 'program code' of an algorithm built from library pieces
18
19 - MODULAR EXPERIMENTS - an experiment object with one (or more?) programs and all of the objects referred to by
20 those programs. It is the preferred type of object to be serialized. The main components of
21 the algorithms should be top-level attributes of the package. This object can be serialized
22 and loaded in another process to implement job migration.
23
24 - OPTIMIZATION - program can be optimized automatically
25 e.g. BUFFER(N, CALL(dataset.next)) can be replaced if dataset.next implements the right
26 attribute/protocol for 'bufferable' or something.
27
28 e.g. SEQ([a,b,c,d]) can be compiled with Theano if sub-sequence is compatible
29
30 - don't need greenlets to get efficiency, the implementations of control flow ops can manage a
31 stack or stack tree in the vm (like greenlets do I think) we don't really need
32 greenlets/stackless I don't think
33
34 """
35
36 __license__ = None
37 __copyright__ = None
38 68
39 import copy, sys, cPickle 69 import copy, sys, cPickle
40
41 import numpy 70 import numpy
42
43 71
44 ################################################### 72 ###################################################
45 # Virtual Machine for executing programs 73 # Virtual Machine for executing programs
46 74
47 class VirtualMachine(object): 75 class VirtualMachine(object):