comparison doc/v2_planning/arch_FB.txt @ 1291:ea923a06dea6

added my architecture proposal.
author Frederic Bastien <nouiz@nouiz.org>
date Thu, 30 Sep 2010 10:16:35 -0400
parents
children abc7a7e22ead
comparison
equal deleted inserted replaced
1290:0ea25edd97e5 1291:ea923a06dea6
1 Current and extenstion of our framework
2 =======================================
3
4 Supposition I make:
5
6 * Dataset, Learner and Layers commity have done their work
7 * That mean we have a more easy way to make a learning model.
8 * Checkpoint solved: we ignore(short jobs), don't care, manual checkpoint, structured checkpoint with an example or use OB system.
9
10 Example MLP
11 -----------
12
13 * Select the hyper parameter search space with `jobman sqlschedules`
14 * Dispatch the jobs with dbidispatch
15 * *Manually*(fixable) reset jobs status to START.
16 * I started it, but I will change the syntax to make it more generic.
17 * *Manually* relaunch crashed jobs.
18 * *Manually*(fixable) analyse/visualise the result. (We need to start those meeting at some point)
19
20 Example MLP+cross validataion
21 -----------------------------
22
23 * Modify the dataset interface to accept 2 new hyper parameter: nb_cross_fold=X, id_cross_fold.
24 * Schedule all of the fold to run in parallel with `jobman sqlschedules`
25 * *Manually* (fixable) reset jobs status to START.
26 * *Manually* relaunch crashed jobs.
27 * *Manually* (fixable) analyse/visualize the result.
28 * Those tools need to understand the concept of cross validation
29 * *Manually* (fixable with proposition bellow) launch a retrain on the full dataset with the best hyper parameter
30
31
32 Example DBN
33 -----------
34
35 * *Concept* JOB PHASE. DBN( unsupervised and supervised)
36 * We suppose the job script have a parameter to tell him witch phase it should do.
37 * *Jobman Extension* We can extend jobman to handle dependency between jobs.
38 * Proposed syntax:
39
40 .. code-block::
41
42 jobman sqlschedule p0={{}} ... -- p1={{}} ... -- p2=...
43
44 * The parameter before the first `--` tell on witch jobs the new jobs depends. (allow to depend on many jobs at the same time)
45 * The parameter between `--` tell that we want to create a new group of jobs for all those jobs.
46 * The parameter after the second `--` tell the new jobs to be create for each new group of jobs.
47
48 * *Jobman Extension* create `jobman dispatch`
49 * This will dispatch new jobs to run on the cluster with dbidispatch when a jobs have his dependency finished.
50 * *Jobman Extension* create `jobman monitor`
51 * This repeadly call `jobman condor_check` to print jobs that can potentially have crashed and print them on the screen. It need to filter the output of condor_check.
52 * Can create other `jobman CLUSTER_check` for mammouth,colosse,angel,...
53 * *Jobman Extension* when we change the status of a job to START in jobman, change the status of the jobs that depend on it at the same time.
54 * *Jobman Extension* determine if a job finished correctly or not
55 * If a job did not finish correctly don't start the following jobs.
56 * *Jobman Policy* All change to the db should be doable by jobman command.
57
58 * *Manually* relaunch crashed jobs.
59 * *Manually*(fixable) analyse/visualise the result.
60 * Those tools need to understand the concept of job phase or be agnostic of that.
61
62
63 * *Cross validataion retrain* can be done with an additional phase in the extensions.
64 * The new job need to know how to determine the best hyper parameter from the result.
65
66
67 * This can be extended for double cross validation.
68 * Dataset must support double cross validation
69 * We create more phase in jobman.
70
71 Hyper parameter search in Pylearn
72 ---------------------------------
73
74 We would want to have the hyper parameter search being done in pylearn in some case. This will add a dependency on jobman. We can finish/verify how jobman work with sqlite to don't have request an installed db. sqlite is included in python 2.5. Jobman request python 2.5. We could make optional the jobman dependency on sqlalchemy when we use sqlite to limit the number of dependency.