comparison doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b

new file about datalearn in pytables.
author Frederic Bastien <nouiz@nouiz.org>
date Mon, 10 Jan 2011 14:55:39 -0500
parents
children 702a933794f7
comparison
equal deleted inserted replaced
1395:54b2268db0d7 1396:310e22d7e44b
1 Big Dataset/Output
2 ==================
3
4 This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.
5
6
7 PyTables
8 --------
9
10 We try to fix that problem by allowing to use Pytables in/with Theano.
11
12 Here is the ticket that I plan to do. They are in order.
13
14 -1) Fix ift6266 script to load PNIST data (DONE)
15 0) Put PNIST in PyTable format
16 1) example with PyTable data in python
17 2) basic filter in the dataset in python
18 3) example with PyTable output in python
19 ?) put stats in the pytables to don't read the file each time to normalize
20 Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
21 4) example with PyTable data in theano
22 5) basic filter in the dataset in theano
23 6) example with PyTable output in theano
24 7) plan a way to store the output temporarily(delete it and store locally)
25
26 The 1,2,3 are their to verify that PyTable can do what we want!
27
28 Here is the current plan.
29 - Make a PyTableVariable
30 - Make a PyTableSubtensor op(or reuse the current one) to allow take a
31 slice on the new variable
32 - Allow scan to work with this new PyTableVariable in input
33 - Allow scan to work with this new PyTableVariable in output
34
35 At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.
36
37 #clone OD repo
38 git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz
39 #create a local branch that track the remote branch
40 git checkout -b variants origin/pytables