diff doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b

new file about datalearn in pytables.
author Frederic Bastien <nouiz@nouiz.org>
date Mon, 10 Jan 2011 14:55:39 -0500
parents
children 702a933794f7
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/datalearn_pytables.txt	Mon Jan 10 14:55:39 2011 -0500
@@ -0,0 +1,40 @@
+Big Dataset/Output
+==================
+
+This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.
+
+
+PyTables
+--------
+
+We try to fix that problem by allowing to use Pytables in/with Theano.
+
+Here is the ticket that I plan to do. They are in order.
+
+-1) Fix ift6266 script to load PNIST data (DONE)
+0) Put PNIST in PyTable format
+1) example with PyTable data in python
+2) basic filter in the dataset in python
+3) example with PyTable output in python
+?) put stats in the pytables to don't read the file each time to normalize
+   Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
+4) example with PyTable data in theano
+5) basic filter in the dataset in theano
+6) example with PyTable output in theano
+7) plan a way to store the output temporarily(delete it and store locally)
+
+The 1,2,3 are their to verify that PyTable can do what we want!
+
+Here is the current plan.
+- Make a PyTableVariable
+- Make a PyTableSubtensor op(or reuse the current one) to allow take a
+slice on the new variable
+- Allow scan to work with this new PyTableVariable in input
+- Allow scan to work with this new PyTableVariable in output
+
+At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.
+
+#clone OD repo
+git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz
+#create a local branch that track the remote branch
+git checkout -b variants origin/pytables