annotate doc/v2_planning/arch_FB.txt @ 1309:e5b7a7913329

fix rst error.
author Frederic Bastien <nouiz@nouiz.org>
date Tue, 05 Oct 2010 12:26:02 -0400
parents abc7a7e22ead
children
rev   line source
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
1 Current and extenstion of our framework
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
2 =======================================
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
3
1292
abc7a7e22ead added comparaison with other proposal.
Frederic Bastien <nouiz@nouiz.org>
parents: 1291
diff changeset
4 This proposition is complementary to PL hook system and OB check point. This could be part of the backend of James system. I don't remember/know enought the other proposal.
abc7a7e22ead added comparaison with other proposal.
Frederic Bastien <nouiz@nouiz.org>
parents: 1291
diff changeset
5
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
6 Supposition I make:
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
7
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
8 * Dataset, Learner and Layers commity have done their work
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
9 * That mean we have a more easy way to make a learning model.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
10 * Checkpoint solved: we ignore(short jobs), don't care, manual checkpoint, structured checkpoint with an example or use OB system.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
11
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
12 Example MLP
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
13 -----------
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
14
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
15 * Select the hyper parameter search space with `jobman sqlschedules`
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
16 * Dispatch the jobs with dbidispatch
1309
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1292
diff changeset
17 * *Manually* (fixable) reset jobs status to START.
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
18 * I started it, but I will change the syntax to make it more generic.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
19 * *Manually* relaunch crashed jobs.
1309
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1292
diff changeset
20 * *Manually* (fixable) analyse/visualise the result. (We need to start those meeting at some point)
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
21
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
22 Example MLP+cross validataion
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
23 -----------------------------
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
24
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
25 * Modify the dataset interface to accept 2 new hyper parameter: nb_cross_fold=X, id_cross_fold.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
26 * Schedule all of the fold to run in parallel with `jobman sqlschedules`
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
27 * *Manually* (fixable) reset jobs status to START.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
28 * *Manually* relaunch crashed jobs.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
29 * *Manually* (fixable) analyse/visualize the result.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
30 * Those tools need to understand the concept of cross validation
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
31 * *Manually* (fixable with proposition bellow) launch a retrain on the full dataset with the best hyper parameter
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
32
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
33
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
34 Example DBN
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
35 -----------
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
36
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
37 * *Concept* JOB PHASE. DBN( unsupervised and supervised)
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
38 * We suppose the job script have a parameter to tell him witch phase it should do.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
39 * *Jobman Extension* We can extend jobman to handle dependency between jobs.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
40 * Proposed syntax:
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
41
1309
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1292
diff changeset
42 .. code-block:: bash
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
43
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
44 jobman sqlschedule p0={{}} ... -- p1={{}} ... -- p2=...
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
45
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
46 * The parameter before the first `--` tell on witch jobs the new jobs depends. (allow to depend on many jobs at the same time)
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
47 * The parameter between `--` tell that we want to create a new group of jobs for all those jobs.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
48 * The parameter after the second `--` tell the new jobs to be create for each new group of jobs.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
49
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
50 * *Jobman Extension* create `jobman dispatch`
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
51 * This will dispatch new jobs to run on the cluster with dbidispatch when a jobs have his dependency finished.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
52 * *Jobman Extension* create `jobman monitor`
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
53 * This repeadly call `jobman condor_check` to print jobs that can potentially have crashed and print them on the screen. It need to filter the output of condor_check.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
54 * Can create other `jobman CLUSTER_check` for mammouth,colosse,angel,...
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
55 * *Jobman Extension* when we change the status of a job to START in jobman, change the status of the jobs that depend on it at the same time.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
56 * *Jobman Extension* determine if a job finished correctly or not
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
57 * If a job did not finish correctly don't start the following jobs.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
58 * *Jobman Policy* All change to the db should be doable by jobman command.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
59
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
60 * *Manually* relaunch crashed jobs.
1309
e5b7a7913329 fix rst error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1292
diff changeset
61 * *Manually* (fixable) analyse/visualise the result.
1291
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
62 * Those tools need to understand the concept of job phase or be agnostic of that.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
63
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
64
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
65 * *Cross validataion retrain* can be done with an additional phase in the extensions.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
66 * The new job need to know how to determine the best hyper parameter from the result.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
67
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
68
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
69 * This can be extended for double cross validation.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
70 * Dataset must support double cross validation
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
71 * We create more phase in jobman.
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
72
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
73 Hyper parameter search in Pylearn
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
74 ---------------------------------
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
75
ea923a06dea6 added my architecture proposal.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
76 We would want to have the hyper parameter search being done in pylearn in some case. This will add a dependency on jobman. We can finish/verify how jobman work with sqlite to don't have request an installed db. sqlite is included in python 2.5. Jobman request python 2.5. We could make optional the jobman dependency on sqlalchemy when we use sqlite to limit the number of dependency.