changeset 1019:91916536a304

merge
author gdesjardins
date Fri, 03 Sep 2010 15:01:23 -0400
parents 790376d986a3 (current diff) af80b7d182af (diff)
children 53f6eb80abf1
files
diffstat 6 files changed, 181 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/coding_style.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -0,0 +1,16 @@
+Discussion of Coding-Style
+==========================
+
+Participants
+------------
+- Dumitru
+- Fred
+- David
+- Olivier D [leader unless David wants to be]
+
+Existing Python coding style specifications:
+--------------------------------------------
+
+    * http://www.python.org/dev/peps/pep-0008/
+    * http://google-styleguide.googlecode.com/svn/trunk/pyguide.html
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/committees.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -0,0 +1,58 @@
+List of committees and their members:
+
+* Existing Python ML libraries investigation: GD, DWF, IG, DE
+* Dataset interface: DE, OB, OD, AB, PV
+* Learners: AB, PL, GM, IG, RP
+* Optimization: JB, PL, OD
+* Inference/sampling: JB, GD, AC
+* Job management, analysis, metrics, costs, visualization: GD, FS, PL, XM
+* Formulas/tags: FB, NB, RP, AC, OB
+* Coding style: DE, OD, DWF, FB
+
+Issues to be tackled in the future:
+
+* serialization & reproducibility 
+* job management, results analysis, metrics & costs, visualization
+* GPU portability
+* social engineering, code review and incentives
+
+Job of each committee:
+
+* name a leader
+* create a text file in this directory associated with their discussion
+* discuss the issues, write them down in this file
+* come up with an interface, protocol, or recommendations
+* bring up the issues and recommendations to the rest of the group, and get feedback
+* make sure the recommendations fit with other committees recommendations
+* update the discussion / recommendation file
+* implement documentation for these conventions in python or proper doc files as appropriate
+* implement one or a few examples that show what is typically expected
+
+What is the role of a committee (team) leader?
+
+* The committee leader takes responsibility for both the quality and
+  timeliness of the work of the committee.
+* The role of the leader is *not* to do the work of the other
+  committee members, but instead to motivate others to ensure that the
+  aims of the committee are pursued.
+* The leader should lead by example and, when necessary, reinvigorate the group
+* The leader would preferably be someone who possess both natural leadership
+  qualities and significant experience in the subject matter of the committee.
+* In choosing the leader, the other committee members must accept to
+  engage in collaboration with, and respect the leadership of, the chosen
+  leader.
+
+Concretely, the leader must:
+
+* Call meeting and set the agenda
+* Focus the efforts of the group to ensure that the committee's
+  priorities are addressed on schedule.
+* Assign well-defined tasks to team members that are to be completed
+  by a fixed deadline.
+* Ensure that the team objectives are met.
+* Followup with team members to ensure that tasks are completed and
+  objectives are followed.
+
+
+
+
--- a/doc/v2_planning/dataset.txt	Fri Sep 03 15:01:02 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -1,3 +1,14 @@
 Discussion of Function Specification for Dataset Types
 ======================================================
 
+Some talking points from the September 2 meeting:
+
+ * Datasets as views/tasks (Pascal Vincent's idea): our dataset specification
+ needs to be flexible enough to accommodate different (sub)tasks and views of
+ the same underlying data.
+ * Datasets as probability distributions from which one can sample.
+ * Our specification should allow transparent handling of infinite datasets (or
+ simply datasets which cannot fit in memory)
+ * GPU/buffering issues.
+
+Commiteee: DE, OB, OD, AB, PV
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/existing_python_ml_libraries.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -0,0 +1,18 @@
+Committee members: GD, DWF, IG, DE
+
+This committee will investigate the possibility of interfacing and/or
+borrowing from other Python machine learning libraries that exist out there.
+Some questions that we need to answer:
+
+ * How much should we try to interface with other libraries? 
+ * What parts can we and should we implement ourselves and what should we leave
+ to the other libraries?
+
+Preliminary list of libraries to look at:
+
+ * Pybrain
+ * MDP
+ * Orange
+ * Shogun python bindings
+ * libsvm python bindings
+
--- a/doc/v2_planning/main_plan.txt	Fri Sep 03 15:01:02 2010 -0400
+++ b/doc/v2_planning/main_plan.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -2,6 +2,47 @@
 Motivation
 ==========
 
+Yoshua (points discussed Thursday Sept 2, 2010 at LISA tea-talk)
+------
+
+****** Why we need to get better organized in our code-writing ******
+
+- current state of affairs on top of Theano is anarchic and does not lend itself to easy code re-use
+- the lab is growing and will continue to grow significantly, and more people outside the lab are using Theano
+- we have new industrial partners and funding sources that demand deliverables, and more/better collectively organized efforts
+
+*** Who can take advantage of this ***
+
+- us, directly, taking advantage of the different advances made by different researchers in the lab to yield better models
+- us, easier to compare different models and different datasets with different metrics on different computing platforms available to us
+- future us, new students, able to quickly move into 'production' mode without having to reinvent the wheel 
+- students in the two ML classes, able to play with the library to explore new ML variants
+- other ML researchers in academia, able to play with our algorithms, try new variants, cite our papers
+- non-ML users in or out of academia, and our user-partners
+
+
+*** Move with care ***
+
+- Write down use-cases, examples for each type of module, do not try to be TOO general
+- Want to keep ease of exploring and flexibility, not create a prison
+- Too many constraints can lead to paralysis, especially in C++ object-oriented model
+- Too few guidelines lead to code components that are not interchangeable
+- Poor code practice leads to buggy, spaguetti code
+
+*** What ***
+
+- define standards
+- write-up a few instances of each basic type (dataset, learner, optimizer, hyper-parameter exploration boilerplate, etc.) enough to implement some of the basic algorithms we use often (e.g. like those in the tutorials)
+- let the library grow according to our needs 
+- keep tight reins on it to control quality 
+
+*** Content and Form ***
+
+We need to establish guidelines and conventions for 
+
+ * Content: what are the re-usable components? define conventions or API for each, make sure they fit with each other
+ * Form: social engineering, coding practices and conventions, code review, incentives
+
 Yoshua:
 -------
 
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/optimization.txt	Fri Sep 03 15:01:23 2010 -0400
@@ -0,0 +1,37 @@
+Discussion of Optimization-Related Issues
+=========================================
+
+Members: JB, PL, OD
+
+Representative: JB
+
+
+Previous work - scikits, openopt, scipy  provide function optimization
+algorithms.  These are not currently GPU-enabled but may be in the future.
+
+
+IS PREVIOUS WORK SUFFICIENT?
+--------------------------------
+
+In many cases it is (I used it for sparse coding, and it was ok).
+
+These packages provide batch optimization, whereas we typically need online
+optimization.
+
+It can be faster (to run) and more convenient (to implement) to have
+optimization algorithms as Theano update expressions.
+
+
+What optimization algorithms do we want/need?
+---------------------------------------------
+
+ - sgd 
+ - sgd + momentum
+ - sgd with annealing schedule
+ - TONGA
+ - James Marten's Hessian-free
+
+Do we need anything to make batch algos work better with Pylearn things?
+ - conjugate methods?
+ - L-BFGS?
+