Mercurial > pylearn
view doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b
new file about datalearn in pytables.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Mon, 10 Jan 2011 14:55:39 -0500 |
parents | |
children | 702a933794f7 |
line wrap: on
line source
Big Dataset/Output ================== This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables. PyTables -------- We try to fix that problem by allowing to use Pytables in/with Theano. Here is the ticket that I plan to do. They are in order. -1) Fix ift6266 script to load PNIST data (DONE) 0) Put PNIST in PyTable format 1) example with PyTable data in python 2) basic filter in the dataset in python 3) example with PyTable output in python ?) put stats in the pytables to don't read the file each time to normalize Maybe we will try another mechanism. But I will start with this one as it seam simple to do. 4) example with PyTable data in theano 5) basic filter in the dataset in theano 6) example with PyTable output in theano 7) plan a way to store the output temporarily(delete it and store locally) The 1,2,3 are their to verify that PyTable can do what we want! Here is the current plan. - Make a PyTableVariable - Make a PyTableSubtensor op(or reuse the current one) to allow take a slice on the new variable - Allow scan to work with this new PyTableVariable in input - Allow scan to work with this new PyTableVariable in output At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first. #clone OD repo git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz #create a local branch that track the remote branch git checkout -b variants origin/pytables