Mercurial > pylearn
view doc/v2_planning/architecture.txt @ 1157:9686c0d9689d
Quick implementation of the Dataset Api we propose.
author | Arnaud Bergeron <abergeron@gmail.com> |
---|---|
date | Fri, 17 Sep 2010 12:01:12 -0400 |
parents | 4eda3f52ebef |
children | e5306f5626d4 |
line wrap: on
line source
==================== Pylearn Architecture ==================== Basic Design Approach ===================== I propose that the basic design of the library follow the Symbolic Expression (SE) structure + virtual machine (VM) pattern that worked for Theano. So the main things for the library to provide would be: - a few VMs, some of which can run programs in parallel across processors, hosts, and networks [R6,R8]; - MLA components as either individual Expressions (similar to Ops) or as subgraphs of SEs [R5,R7,R10,R11] - machine learning algorithms including their training and testing in the form of python functions that build SE graphs.[R1,R8]. This design addresses R2 (modularity) because swapping components is literally implemented by swapping subgraphs. The design addresses R9 (algorithmic efficiency) because we can write Theano-style graph transformations to recognize special cases of component combinations. The design addresses R3 if we make the additional decision that the VMs (at least sometimes) cache the return value of program function calls. This cache serves as a database of experimental results, indexed by the functions that originally computed them. I think this is a very natural scheme for organizing experiment results, and ensuring experiment reproducibility [R1]. At the same time, this is a clean and simple API behind which experiments can be saved using a number of database technologies. APIs vs. lambda ---------------- Modularity in general is achieved when pieces can be substituted one for the other. In an object-oriented design, modularity is achieved by agreeing on interface APIs, but in a functional design there is another possibility: the lambda. In an SE these pieces are expression [applications] and the subgraphs they form. A subgraph is characterized syntactically within the program by its arguments and its return values. A lambda function allows the User to create new Expression types from arbitrary subgraphs with very few keystrokes. When a lambda is available and easy to use, there is much less pressure on the expression library to follow calling and return conventions strictly. Of course, the closer are two subgraphs in terms of their inputs, outputs, and semantics, the easier it is to substitute one for the other. As library designers, we should still aim for compatibility of similar algorithms. It's just not essential to choose an API that will guarantee a match, or indeed to choose any explicit API at all.