Mercurial > pylearn
view doc/v2_planning/datalearn_pytables.txt @ 1517:a6e634b83d88
allow to read filetensor compressed with bz2
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Wed, 09 May 2012 11:56:28 -0400 |
parents | 1934ba31b7d9 |
children |
line wrap: on
line source
Big Dataset/Output ================== This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables. PyTables -------- We try to fix that problem by allowing to use Pytables in/with Theano. Here is the ticket that I plan to do. They are in order. - Fix ift6266 script to load PNIST data (DONE) - Put PNIST in PyTable format (Done a first version) - example with PyTable data in python (Done with modif needed in Theano) - basic filter in the dataset in python - example with PyTable output in python - ? put stats in the pytables to don't read the file each time to normalize Maybe we will try another mechanism. But I will start with this one as it seam simple to do. - example with PyTable data in theano - basic filter in the dataset in theano - example with PyTable output in theano - plan a way to store the output temporarily(delete it and store locally) The 1,2,3 are their to verify that PyTable can do what we want! Here is the current plan. - Make a PyTableVariable - Make a PyTableSubtensor op(or reuse the current one) to allow take a slice on the new variable - Allow scan to work with this new PyTableVariable in input - Allow scan to work with this new PyTableVariable in output At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first. Getting the code ---------------- - clone FB repo: .. code-block:: bash git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz - create a local branch that track the remote branch .. code-block:: bash git checkout -b variants origin/pytables