annotate doc/v2_planning/datalearn_pytables.txt @ 1396:310e22d7e44b

new file about datalearn in pytables.
author Frederic Bastien <nouiz@nouiz.org>
date Mon, 10 Jan 2011 14:55:39 -0500
parents
children 702a933794f7
rev   line source
1396
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
1 Big Dataset/Output
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
2 ==================
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
3
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
4 This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
5
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
6
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
7 PyTables
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
8 --------
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
9
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
10 We try to fix that problem by allowing to use Pytables in/with Theano.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
11
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
12 Here is the ticket that I plan to do. They are in order.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
13
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
14 -1) Fix ift6266 script to load PNIST data (DONE)
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
15 0) Put PNIST in PyTable format
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
16 1) example with PyTable data in python
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
17 2) basic filter in the dataset in python
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
18 3) example with PyTable output in python
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
19 ?) put stats in the pytables to don't read the file each time to normalize
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
20 Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
21 4) example with PyTable data in theano
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
22 5) basic filter in the dataset in theano
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
23 6) example with PyTable output in theano
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
24 7) plan a way to store the output temporarily(delete it and store locally)
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
25
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
26 The 1,2,3 are their to verify that PyTable can do what we want!
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
27
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
28 Here is the current plan.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
29 - Make a PyTableVariable
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
30 - Make a PyTableSubtensor op(or reuse the current one) to allow take a
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
31 slice on the new variable
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
32 - Allow scan to work with this new PyTableVariable in input
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
33 - Allow scan to work with this new PyTableVariable in output
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
34
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
35 At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
36
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
37 #clone OD repo
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
38 git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
39 #create a local branch that track the remote branch
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
40 git checkout -b variants origin/pytables