# HG changeset patch # User Frederic Bastien # Date 1294689339 18000 # Node ID 310e22d7e44b8323948b6b5fb41810b582049697 # Parent 54b2268db0d739a0aa1f85c73a39746d4f256281 new file about datalearn in pytables. diff -r 54b2268db0d7 -r 310e22d7e44b doc/v2_planning/datalearn_pytables.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/datalearn_pytables.txt Mon Jan 10 14:55:39 2011 -0500 @@ -0,0 +1,40 @@ +Big Dataset/Output +================== + +This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables. + + +PyTables +-------- + +We try to fix that problem by allowing to use Pytables in/with Theano. + +Here is the ticket that I plan to do. They are in order. + +-1) Fix ift6266 script to load PNIST data (DONE) +0) Put PNIST in PyTable format +1) example with PyTable data in python +2) basic filter in the dataset in python +3) example with PyTable output in python +?) put stats in the pytables to don't read the file each time to normalize + Maybe we will try another mechanism. But I will start with this one as it seam simple to do. +4) example with PyTable data in theano +5) basic filter in the dataset in theano +6) example with PyTable output in theano +7) plan a way to store the output temporarily(delete it and store locally) + +The 1,2,3 are their to verify that PyTable can do what we want! + +Here is the current plan. +- Make a PyTableVariable +- Make a PyTableSubtensor op(or reuse the current one) to allow take a +slice on the new variable +- Allow scan to work with this new PyTableVariable in input +- Allow scan to work with this new PyTableVariable in output + +At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first. + +#clone OD repo +git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz +#create a local branch that track the remote branch +git checkout -b variants origin/pytables