Mercurial > ift6266
annotate data_generation/transformations/filetensor.py @ 643:24d9819a810f
reviews aistats finales
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Thu, 24 Mar 2011 17:04:38 -0400 |
parents | 1f5937e9e530 |
children |
rev | line source |
---|---|
10
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
1 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
2 Read and write the matrix file format described at |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
3 U{http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html} |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
4 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
5 The format is for dense tensors: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
6 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
7 - magic number indicating type and endianness - 4bytes |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
8 - rank of tensor - int32 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
9 - dimensions - int32, int32, int32, ... |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
10 - <data> |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
11 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
12 The number of dimensions and rank is slightly tricky: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
13 - for scalar: rank=0, dimensions = [1, 1, 1] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
14 - for vector: rank=1, dimensions = [?, 1, 1] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
15 - for matrix: rank=2, dimensions = [?, ?, 1] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
16 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
17 For rank >= 3, the number of dimensions matches the rank exactly. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
18 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
19 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
20 @todo: add complex type support |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
21 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
22 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
23 import sys |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
24 import numpy |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
25 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
26 def _prod(lst): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
27 p = 1 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
28 for l in lst: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
29 p *= l |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
30 return p |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
31 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
32 _magic_dtype = { |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
33 0x1E3D4C51 : ('float32', 4), |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
34 #0x1E3D4C52 : ('packed matrix', 0), #what is a packed matrix? |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
35 0x1E3D4C53 : ('float64', 8), |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
36 0x1E3D4C54 : ('int32', 4), |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
37 0x1E3D4C55 : ('uint8', 1), |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
38 0x1E3D4C56 : ('int16', 2), |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
39 } |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
40 _dtype_magic = { |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
41 'float32': 0x1E3D4C51, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
42 #'packed matrix': 0x1E3D4C52, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
43 'float64': 0x1E3D4C53, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
44 'int32': 0x1E3D4C54, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
45 'uint8': 0x1E3D4C55, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
46 'int16': 0x1E3D4C56 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
47 } |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
48 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
49 def _read_int32(f): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
50 """unpack a 4-byte integer from the current position in file f""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
51 s = f.read(4) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
52 s_array = numpy.fromstring(s, dtype='int32') |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
53 return s_array.item() |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
54 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
55 def _read_header(f, debug=False): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
56 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
57 :returns: data type, element size, rank, shape, size |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
58 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
59 #what is the data type of this matrix? |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
60 #magic_s = f.read(4) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
61 #magic = numpy.fromstring(magic_s, dtype='int32') |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
62 magic = _read_int32(f) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
63 magic_t, elsize = _magic_dtype[magic] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
64 if debug: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
65 print 'header magic', magic, magic_t, elsize |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
66 if magic_t == 'packed matrix': |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
67 raise NotImplementedError('packed matrix not supported') |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
68 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
69 #what is the rank of the tensor? |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
70 ndim = _read_int32(f) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
71 if debug: print 'header ndim', ndim |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
72 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
73 #what are the dimensions of the tensor? |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
74 dim = numpy.fromfile(f, dtype='int32', count=max(ndim,3))[:ndim] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
75 dim_size = _prod(dim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
76 if debug: print 'header dim', dim, dim_size |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
77 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
78 return magic_t, elsize, ndim, dim, dim_size |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
79 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
80 class arraylike(object): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
81 """Provide an array-like interface to the filetensor in f. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
82 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
83 The rank parameter to __init__ controls how this object interprets the underlying tensor. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
84 Its behaviour should be clear from the following example. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
85 Suppose the underlying tensor is MxNxK. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
86 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
87 - If rank is 0, self[i] will be a scalar and len(self) == M*N*K. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
88 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
89 - If rank is 1, self[i] is a vector of length K, and len(self) == M*N. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
90 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
91 - If rank is 3, self[i] is a 3D tensor of size MxNxK, and len(self)==1. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
92 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
93 - If rank is 5, self[i] is a 5D tensor of size 1x1xMxNxK, and len(self) == 1. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
94 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
95 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
96 :note: Objects of this class generally require exclusive use of the underlying file handle, because |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
97 they call seek() every time you access an element. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
98 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
99 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
100 f = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
101 """File-like object""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
102 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
103 magic_t = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
104 """numpy data type of array""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
105 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
106 elsize = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
107 """number of bytes per scalar element""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
108 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
109 ndim = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
110 """Rank of underlying tensor""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
111 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
112 dim = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
113 """tuple of array dimensions (aka shape)""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
114 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
115 dim_size = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
116 """number of scalars in the tensor (prod of dim)""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
117 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
118 f_start = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
119 """The file position of the first element of the tensor""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
120 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
121 readshape = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
122 """tuple of array dimensions of the block that we read""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
123 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
124 readsize = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
125 """number of elements we must read for each block""" |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
126 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
127 def __init__(self, f, rank=0, debug=False): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
128 self.f = f |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
129 self.magic_t, self.elsize, self.ndim, self.dim, self.dim_size = _read_header(f,debug) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
130 self.f_start = f.tell() |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
131 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
132 if rank <= self.ndim: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
133 self.readshape = tuple(self.dim[self.ndim-rank:]) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
134 else: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
135 self.readshape = tuple(self.dim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
136 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
137 #self.readshape = tuple(self.dim[self.ndim-rank:]) if rank <= self.ndim else tuple(self.dim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
138 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
139 if rank <= self.ndim: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
140 padding = tuple() |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
141 else: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
142 padding = (1,) * (rank - self.ndim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
143 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
144 #padding = tuple() if rank <= self.ndim else (1,) * (rank - self.ndim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
145 self.returnshape = padding + self.readshape |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
146 self.readsize = _prod(self.readshape) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
147 if debug: print 'READ PARAM', self.readshape, self.returnshape, self.readsize |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
148 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
149 def __len__(self): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
150 return _prod(self.dim[:self.ndim-len(self.readshape)]) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
151 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
152 def __getitem__(self, idx): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
153 if idx >= len(self): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
154 raise IndexError(idx) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
155 self.f.seek(self.f_start + idx * self.elsize * self.readsize) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
156 return numpy.fromfile(self.f, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
157 dtype=self.magic_t, |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
158 count=self.readsize).reshape(self.returnshape) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
159 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
160 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
161 # |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
162 # TODO: implement item selection: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
163 # e.g. load('some mat', subtensor=(:6, 2:5)) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
164 # |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
165 # This function should be memory efficient by: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
166 # - allocating an output matrix at the beginning |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
167 # - seeking through the file, reading subtensors from multiple places |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
168 def read(f, subtensor=None, debug=False): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
169 """Load all or part of file 'f' into a numpy ndarray |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
170 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
171 @param f: file from which to read |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
172 @type f: file-like object |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
173 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
174 If subtensor is not None, it should be like the argument to |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
175 numpy.ndarray.__getitem__. The following two expressions should return |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
176 equivalent ndarray objects, but the one on the left may be faster and more |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
177 memory efficient if the underlying file f is big. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
178 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
179 read(f, subtensor) <===> read(f)[*subtensor] |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
180 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
181 Support for subtensors is currently spotty, so check the code to see if your |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
182 particular type of subtensor is supported. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
183 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
184 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
185 magic_t, elsize, ndim, dim, dim_size = _read_header(f,debug) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
186 f_start = f.tell() |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
187 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
188 rval = None |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
189 if subtensor is None: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
190 rval = numpy.fromfile(f, dtype=magic_t, count=_prod(dim)).reshape(dim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
191 elif isinstance(subtensor, slice): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
192 if subtensor.step not in (None, 1): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
193 raise NotImplementedError('slice with step', subtensor.step) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
194 if subtensor.start not in (None, 0): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
195 bytes_per_row = _prod(dim[1:]) * elsize |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
196 f.seek(f_start + subtensor.start * bytes_per_row) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
197 dim[0] = min(dim[0], subtensor.stop) - subtensor.start |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
198 rval = numpy.fromfile(f, dtype=magic_t, count=_prod(dim)).reshape(dim) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
199 else: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
200 raise NotImplementedError('subtensor access not written yet:', subtensor) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
201 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
202 return rval |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
203 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
204 def write(f, mat): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
205 """Write a numpy.ndarray to file. |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
206 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
207 @param f: file into which to write |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
208 @type f: file-like object |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
209 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
210 @param mat: array to write to file |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
211 @type mat: numpy ndarray or compatible |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
212 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
213 """ |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
214 def _write_int32(f, i): |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
215 i_array = numpy.asarray(i, dtype='int32') |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
216 if 0: print 'writing int32', i, i_array |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
217 i_array.tofile(f) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
218 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
219 try: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
220 _write_int32(f, _dtype_magic[str(mat.dtype)]) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
221 except KeyError: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
222 raise TypeError('Invalid ndarray dtype for filetensor format', mat.dtype) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
223 |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
224 _write_int32(f, len(mat.shape)) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
225 shape = mat.shape |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
226 if len(shape) < 3: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
227 shape = list(shape) + [1] * (3 - len(shape)) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
228 if 0: print 'writing shape =', shape |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
229 for sh in shape: |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
230 _write_int32(f, sh) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
231 mat.tofile(f) |
faacc76d21c2
Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
232 |