Mercurial > pylearn

Big Dataset/Output
==================

This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.


PyTables
--------

We try to fix that problem by allowing to use Pytables in/with Theano.

Here is the ticket that I plan to do. They are in order.

    - Fix ift6266 script to load PNIST data (DONE)
    - Put PNIST in PyTable format
    - example with PyTable data in python
    - basic filter in the dataset in python
    - example with PyTable output in python
    - ? put stats in the pytables to don't read the file each time to normalize
      Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
    - example with PyTable data in theano
    - basic filter in the dataset in theano
    - example with PyTable output in theano
    - plan a way to store the output temporarily(delete it and store locally)

The 1,2,3 are their to verify that PyTable can do what we want!

Here is the current plan.
    - Make a PyTableVariable
    - Make a PyTableSubtensor op(or reuse the current one) to allow take a
      slice on the new variable
    - Allow scan to work with this new PyTableVariable in input
    - Allow scan to work with this new PyTableVariable in output

At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.

Getting the code
----------------

- clone FB repo:

    .. code-block:: bash
        git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz

- create a local branch that track the remote branch

    .. code-block:: bash
        git checkout -b variants origin/pytables
author	Frederic Bastien <nouiz@nouiz.org>
date	Tue, 11 Jan 2011 16:43:10 -0500
parents	310e22d7e44b
children	1934ba31b7d9