view doc/v2_planning/datalearn_pytables.txt @ 1451:8110ca3cec3f

merge
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 31 Mar 2011 18:29:11 -0400
parents 1934ba31b7d9
children
line wrap: on
line source

Big Dataset/Output
==================

This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.


PyTables
--------

We try to fix that problem by allowing to use Pytables in/with Theano.

Here is the ticket that I plan to do. They are in order.

    - Fix ift6266 script to load PNIST data (DONE)
    - Put PNIST in PyTable format (Done a first version)
    - example with PyTable data in python (Done with modif needed in Theano)
    - basic filter in the dataset in python
    - example with PyTable output in python
    - ? put stats in the pytables to don't read the file each time to normalize
      Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
    - example with PyTable data in theano
    - basic filter in the dataset in theano
    - example with PyTable output in theano
    - plan a way to store the output temporarily(delete it and store locally)

The 1,2,3 are their to verify that PyTable can do what we want!

Here is the current plan.
    - Make a PyTableVariable
    - Make a PyTableSubtensor op(or reuse the current one) to allow take a
      slice on the new variable
    - Allow scan to work with this new PyTableVariable in input
    - Allow scan to work with this new PyTableVariable in output

At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.

Getting the code
----------------

- clone FB repo:

    .. code-block:: bash

        git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz

- create a local branch that track the remote branch

    .. code-block:: bash

        git checkout -b variants origin/pytables