view doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b

new file about datalearn in pytables.
author Frederic Bastien <nouiz@nouiz.org>
date Mon, 10 Jan 2011 14:55:39 -0500
parents
children 702a933794f7
line wrap: on
line source

Big Dataset/Output
==================

This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.


PyTables
--------

We try to fix that problem by allowing to use Pytables in/with Theano.

Here is the ticket that I plan to do. They are in order.

-1) Fix ift6266 script to load PNIST data (DONE)
0) Put PNIST in PyTable format
1) example with PyTable data in python
2) basic filter in the dataset in python
3) example with PyTable output in python
?) put stats in the pytables to don't read the file each time to normalize
   Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
4) example with PyTable data in theano
5) basic filter in the dataset in theano
6) example with PyTable output in theano
7) plan a way to store the output temporarily(delete it and store locally)

The 1,2,3 are their to verify that PyTable can do what we want!

Here is the current plan.
- Make a PyTableVariable
- Make a PyTableSubtensor op(or reuse the current one) to allow take a
slice on the new variable
- Allow scan to work with this new PyTableVariable in input
- Allow scan to work with this new PyTableVariable in output

At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.

#clone OD repo
git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz
#create a local branch that track the remote branch
git checkout -b variants origin/pytables