# HG changeset patch # User gdesjardins # Date 1283540483 14400 # Node ID 91916536a3046ee5bfc22f4b2bd1decb5e5c94a8 # Parent 790376d986a37a69de3e71201b7db7936e63569f# Parent af80b7d182af3746a12c602c66ac37dbd9c3a4c2 merge diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/coding_style.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/coding_style.txt Fri Sep 03 15:01:23 2010 -0400 @@ -0,0 +1,16 @@ +Discussion of Coding-Style +========================== + +Participants +------------ +- Dumitru +- Fred +- David +- Olivier D [leader unless David wants to be] + +Existing Python coding style specifications: +-------------------------------------------- + + * http://www.python.org/dev/peps/pep-0008/ + * http://google-styleguide.googlecode.com/svn/trunk/pyguide.html + diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/committees.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/committees.txt Fri Sep 03 15:01:23 2010 -0400 @@ -0,0 +1,58 @@ +List of committees and their members: + +* Existing Python ML libraries investigation: GD, DWF, IG, DE +* Dataset interface: DE, OB, OD, AB, PV +* Learners: AB, PL, GM, IG, RP +* Optimization: JB, PL, OD +* Inference/sampling: JB, GD, AC +* Job management, analysis, metrics, costs, visualization: GD, FS, PL, XM +* Formulas/tags: FB, NB, RP, AC, OB +* Coding style: DE, OD, DWF, FB + +Issues to be tackled in the future: + +* serialization & reproducibility +* job management, results analysis, metrics & costs, visualization +* GPU portability +* social engineering, code review and incentives + +Job of each committee: + +* name a leader +* create a text file in this directory associated with their discussion +* discuss the issues, write them down in this file +* come up with an interface, protocol, or recommendations +* bring up the issues and recommendations to the rest of the group, and get feedback +* make sure the recommendations fit with other committees recommendations +* update the discussion / recommendation file +* implement documentation for these conventions in python or proper doc files as appropriate +* implement one or a few examples that show what is typically expected + +What is the role of a committee (team) leader? + +* The committee leader takes responsibility for both the quality and + timeliness of the work of the committee. +* The role of the leader is *not* to do the work of the other + committee members, but instead to motivate others to ensure that the + aims of the committee are pursued. +* The leader should lead by example and, when necessary, reinvigorate the group +* The leader would preferably be someone who possess both natural leadership + qualities and significant experience in the subject matter of the committee. +* In choosing the leader, the other committee members must accept to + engage in collaboration with, and respect the leadership of, the chosen + leader. + +Concretely, the leader must: + +* Call meeting and set the agenda +* Focus the efforts of the group to ensure that the committee's + priorities are addressed on schedule. +* Assign well-defined tasks to team members that are to be completed + by a fixed deadline. +* Ensure that the team objectives are met. +* Followup with team members to ensure that tasks are completed and + objectives are followed. + + + + diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/dataset.txt --- a/doc/v2_planning/dataset.txt Fri Sep 03 15:01:02 2010 -0400 +++ b/doc/v2_planning/dataset.txt Fri Sep 03 15:01:23 2010 -0400 @@ -1,3 +1,14 @@ Discussion of Function Specification for Dataset Types ====================================================== +Some talking points from the September 2 meeting: + + * Datasets as views/tasks (Pascal Vincent's idea): our dataset specification + needs to be flexible enough to accommodate different (sub)tasks and views of + the same underlying data. + * Datasets as probability distributions from which one can sample. + * Our specification should allow transparent handling of infinite datasets (or + simply datasets which cannot fit in memory) + * GPU/buffering issues. + +Commiteee: DE, OB, OD, AB, PV diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/existing_python_ml_libraries.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/existing_python_ml_libraries.txt Fri Sep 03 15:01:23 2010 -0400 @@ -0,0 +1,18 @@ +Committee members: GD, DWF, IG, DE + +This committee will investigate the possibility of interfacing and/or +borrowing from other Python machine learning libraries that exist out there. +Some questions that we need to answer: + + * How much should we try to interface with other libraries? + * What parts can we and should we implement ourselves and what should we leave + to the other libraries? + +Preliminary list of libraries to look at: + + * Pybrain + * MDP + * Orange + * Shogun python bindings + * libsvm python bindings + diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/main_plan.txt --- a/doc/v2_planning/main_plan.txt Fri Sep 03 15:01:02 2010 -0400 +++ b/doc/v2_planning/main_plan.txt Fri Sep 03 15:01:23 2010 -0400 @@ -2,6 +2,47 @@ Motivation ========== +Yoshua (points discussed Thursday Sept 2, 2010 at LISA tea-talk) +------ + +****** Why we need to get better organized in our code-writing ****** + +- current state of affairs on top of Theano is anarchic and does not lend itself to easy code re-use +- the lab is growing and will continue to grow significantly, and more people outside the lab are using Theano +- we have new industrial partners and funding sources that demand deliverables, and more/better collectively organized efforts + +*** Who can take advantage of this *** + +- us, directly, taking advantage of the different advances made by different researchers in the lab to yield better models +- us, easier to compare different models and different datasets with different metrics on different computing platforms available to us +- future us, new students, able to quickly move into 'production' mode without having to reinvent the wheel +- students in the two ML classes, able to play with the library to explore new ML variants +- other ML researchers in academia, able to play with our algorithms, try new variants, cite our papers +- non-ML users in or out of academia, and our user-partners + + +*** Move with care *** + +- Write down use-cases, examples for each type of module, do not try to be TOO general +- Want to keep ease of exploring and flexibility, not create a prison +- Too many constraints can lead to paralysis, especially in C++ object-oriented model +- Too few guidelines lead to code components that are not interchangeable +- Poor code practice leads to buggy, spaguetti code + +*** What *** + +- define standards +- write-up a few instances of each basic type (dataset, learner, optimizer, hyper-parameter exploration boilerplate, etc.) enough to implement some of the basic algorithms we use often (e.g. like those in the tutorials) +- let the library grow according to our needs +- keep tight reins on it to control quality + +*** Content and Form *** + +We need to establish guidelines and conventions for + + * Content: what are the re-usable components? define conventions or API for each, make sure they fit with each other + * Form: social engineering, coding practices and conventions, code review, incentives + Yoshua: ------- diff -r 790376d986a3 -r 91916536a304 doc/v2_planning/optimization.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/optimization.txt Fri Sep 03 15:01:23 2010 -0400 @@ -0,0 +1,37 @@ +Discussion of Optimization-Related Issues +========================================= + +Members: JB, PL, OD + +Representative: JB + + +Previous work - scikits, openopt, scipy provide function optimization +algorithms. These are not currently GPU-enabled but may be in the future. + + +IS PREVIOUS WORK SUFFICIENT? +-------------------------------- + +In many cases it is (I used it for sparse coding, and it was ok). + +These packages provide batch optimization, whereas we typically need online +optimization. + +It can be faster (to run) and more convenient (to implement) to have +optimization algorithms as Theano update expressions. + + +What optimization algorithms do we want/need? +--------------------------------------------- + + - sgd + - sgd + momentum + - sgd with annealing schedule + - TONGA + - James Marten's Hessian-free + +Do we need anything to make batch algos work better with Pylearn things? + - conjugate methods? + - L-BFGS? +