Mercurial > pylearn
diff doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b
new file about datalearn in pytables.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Mon, 10 Jan 2011 14:55:39 -0500 |
parents | |
children | 702a933794f7 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/datalearn_pytables.txt Mon Jan 10 14:55:39 2011 -0500 @@ -0,0 +1,40 @@ +Big Dataset/Output +================== + +This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables. + + +PyTables +-------- + +We try to fix that problem by allowing to use Pytables in/with Theano. + +Here is the ticket that I plan to do. They are in order. + +-1) Fix ift6266 script to load PNIST data (DONE) +0) Put PNIST in PyTable format +1) example with PyTable data in python +2) basic filter in the dataset in python +3) example with PyTable output in python +?) put stats in the pytables to don't read the file each time to normalize + Maybe we will try another mechanism. But I will start with this one as it seam simple to do. +4) example with PyTable data in theano +5) basic filter in the dataset in theano +6) example with PyTable output in theano +7) plan a way to store the output temporarily(delete it and store locally) + +The 1,2,3 are their to verify that PyTable can do what we want! + +Here is the current plan. +- Make a PyTableVariable +- Make a PyTableSubtensor op(or reuse the current one) to allow take a +slice on the new variable +- Allow scan to work with this new PyTableVariable in input +- Allow scan to work with this new PyTableVariable in output + +At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first. + +#clone OD repo +git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz +#create a local branch that track the remote branch +git checkout -b variants origin/pytables