annotate doc/v2_planning/datalearn_pytables.txt @ 1470:94268a161925

memmap support in Dataset op
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 18 May 2011 10:50:21 -0400
parents 1934ba31b7d9
children
rev   line source
1396
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
1 Big Dataset/Output
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
2 ==================
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
3
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
4 This file show the current plan on how we plan to fix the problem of dataset/output that don't fit in memory with PyTables.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
5
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
6
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
7 PyTables
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
8 --------
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
9
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
10 We try to fix that problem by allowing to use Pytables in/with Theano.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
11
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
12 Here is the ticket that I plan to do. They are in order.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
13
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
14 - Fix ift6266 script to load PNIST data (DONE)
1398
1934ba31b7d9 rst syntax fix and update to what is currently done.
Frederic Bastien <nouiz@nouiz.org>
parents: 1397
diff changeset
15 - Put PNIST in PyTable format (Done a first version)
1934ba31b7d9 rst syntax fix and update to what is currently done.
Frederic Bastien <nouiz@nouiz.org>
parents: 1397
diff changeset
16 - example with PyTable data in python (Done with modif needed in Theano)
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
17 - basic filter in the dataset in python
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
18 - example with PyTable output in python
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
19 - ? put stats in the pytables to don't read the file each time to normalize
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
20 Maybe we will try another mechanism. But I will start with this one as it seam simple to do.
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
21 - example with PyTable data in theano
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
22 - basic filter in the dataset in theano
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
23 - example with PyTable output in theano
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
24 - plan a way to store the output temporarily(delete it and store locally)
1396
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
25
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
26 The 1,2,3 are their to verify that PyTable can do what we want!
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
27
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
28 Here is the current plan.
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
29 - Make a PyTableVariable
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
30 - Make a PyTableSubtensor op(or reuse the current one) to allow take a
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
31 slice on the new variable
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
32 - Allow scan to work with this new PyTableVariable in input
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
33 - Allow scan to work with this new PyTableVariable in output
1396
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
34
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
35 At first no inplace op. So no view_map and destory_map won't work. The way I see it is as an interface to a file inside theano. No direct modification allowed first.
310e22d7e44b new file about datalearn in pytables.
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
36
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
37 Getting the code
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
38 ----------------
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
39
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
40 - clone FB repo:
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
41
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
42 .. code-block:: bash
1398
1934ba31b7d9 rst syntax fix and update to what is currently done.
Frederic Bastien <nouiz@nouiz.org>
parents: 1397
diff changeset
43
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
44 git clone git@github.com:nouiz/pylearn.git Pylearn.nouiz
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
45
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
46 - create a local branch that track the remote branch
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
47
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
48 .. code-block:: bash
1398
1934ba31b7d9 rst syntax fix and update to what is currently done.
Frederic Bastien <nouiz@nouiz.org>
parents: 1397
diff changeset
49
1397
702a933794f7 small format change.
Frederic Bastien <nouiz@nouiz.org>
parents: 1396
diff changeset
50 git checkout -b variants origin/pytables