annotate data_generation/transformations/filetensor.py @ 613:5e481b224117

fix the reading of PNIST dataset following Dumi compression of the data.
author Frederic Bastien <nouiz@nouiz.org>
date Thu, 06 Jan 2011 13:57:05 -0500
parents 1f5937e9e530
children
rev   line source
10
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
1 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
2 Read and write the matrix file format described at
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
3 U{http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html}
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
4
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
5 The format is for dense tensors:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
6
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
7 - magic number indicating type and endianness - 4bytes
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
8 - rank of tensor - int32
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
9 - dimensions - int32, int32, int32, ...
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
10 - <data>
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
11
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
12 The number of dimensions and rank is slightly tricky:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
13 - for scalar: rank=0, dimensions = [1, 1, 1]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
14 - for vector: rank=1, dimensions = [?, 1, 1]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
15 - for matrix: rank=2, dimensions = [?, ?, 1]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
16
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
17 For rank >= 3, the number of dimensions matches the rank exactly.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
18
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
19
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
20 @todo: add complex type support
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
21
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
22 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
23 import sys
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
24 import numpy
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
25
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
26 def _prod(lst):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
27 p = 1
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
28 for l in lst:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
29 p *= l
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
30 return p
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
31
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
32 _magic_dtype = {
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
33 0x1E3D4C51 : ('float32', 4),
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
34 #0x1E3D4C52 : ('packed matrix', 0), #what is a packed matrix?
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
35 0x1E3D4C53 : ('float64', 8),
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
36 0x1E3D4C54 : ('int32', 4),
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
37 0x1E3D4C55 : ('uint8', 1),
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
38 0x1E3D4C56 : ('int16', 2),
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
39 }
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
40 _dtype_magic = {
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
41 'float32': 0x1E3D4C51,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
42 #'packed matrix': 0x1E3D4C52,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
43 'float64': 0x1E3D4C53,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
44 'int32': 0x1E3D4C54,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
45 'uint8': 0x1E3D4C55,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
46 'int16': 0x1E3D4C56
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
47 }
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
48
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
49 def _read_int32(f):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
50 """unpack a 4-byte integer from the current position in file f"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
51 s = f.read(4)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
52 s_array = numpy.fromstring(s, dtype='int32')
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
53 return s_array.item()
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
54
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
55 def _read_header(f, debug=False):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
56 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
57 :returns: data type, element size, rank, shape, size
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
58 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
59 #what is the data type of this matrix?
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
60 #magic_s = f.read(4)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
61 #magic = numpy.fromstring(magic_s, dtype='int32')
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
62 magic = _read_int32(f)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
63 magic_t, elsize = _magic_dtype[magic]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
64 if debug:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
65 print 'header magic', magic, magic_t, elsize
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
66 if magic_t == 'packed matrix':
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
67 raise NotImplementedError('packed matrix not supported')
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
68
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
69 #what is the rank of the tensor?
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
70 ndim = _read_int32(f)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
71 if debug: print 'header ndim', ndim
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
72
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
73 #what are the dimensions of the tensor?
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
74 dim = numpy.fromfile(f, dtype='int32', count=max(ndim,3))[:ndim]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
75 dim_size = _prod(dim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
76 if debug: print 'header dim', dim, dim_size
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
77
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
78 return magic_t, elsize, ndim, dim, dim_size
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
79
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
80 class arraylike(object):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
81 """Provide an array-like interface to the filetensor in f.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
82
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
83 The rank parameter to __init__ controls how this object interprets the underlying tensor.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
84 Its behaviour should be clear from the following example.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
85 Suppose the underlying tensor is MxNxK.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
86
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
87 - If rank is 0, self[i] will be a scalar and len(self) == M*N*K.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
88
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
89 - If rank is 1, self[i] is a vector of length K, and len(self) == M*N.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
90
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
91 - If rank is 3, self[i] is a 3D tensor of size MxNxK, and len(self)==1.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
92
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
93 - If rank is 5, self[i] is a 5D tensor of size 1x1xMxNxK, and len(self) == 1.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
94
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
95
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
96 :note: Objects of this class generally require exclusive use of the underlying file handle, because
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
97 they call seek() every time you access an element.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
98 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
99
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
100 f = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
101 """File-like object"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
102
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
103 magic_t = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
104 """numpy data type of array"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
105
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
106 elsize = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
107 """number of bytes per scalar element"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
108
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
109 ndim = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
110 """Rank of underlying tensor"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
111
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
112 dim = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
113 """tuple of array dimensions (aka shape)"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
114
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
115 dim_size = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
116 """number of scalars in the tensor (prod of dim)"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
117
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
118 f_start = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
119 """The file position of the first element of the tensor"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
120
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
121 readshape = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
122 """tuple of array dimensions of the block that we read"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
123
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
124 readsize = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
125 """number of elements we must read for each block"""
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
126
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
127 def __init__(self, f, rank=0, debug=False):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
128 self.f = f
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
129 self.magic_t, self.elsize, self.ndim, self.dim, self.dim_size = _read_header(f,debug)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
130 self.f_start = f.tell()
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
131
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
132 if rank <= self.ndim:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
133 self.readshape = tuple(self.dim[self.ndim-rank:])
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
134 else:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
135 self.readshape = tuple(self.dim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
136
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
137 #self.readshape = tuple(self.dim[self.ndim-rank:]) if rank <= self.ndim else tuple(self.dim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
138
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
139 if rank <= self.ndim:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
140 padding = tuple()
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
141 else:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
142 padding = (1,) * (rank - self.ndim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
143
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
144 #padding = tuple() if rank <= self.ndim else (1,) * (rank - self.ndim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
145 self.returnshape = padding + self.readshape
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
146 self.readsize = _prod(self.readshape)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
147 if debug: print 'READ PARAM', self.readshape, self.returnshape, self.readsize
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
148
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
149 def __len__(self):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
150 return _prod(self.dim[:self.ndim-len(self.readshape)])
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
151
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
152 def __getitem__(self, idx):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
153 if idx >= len(self):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
154 raise IndexError(idx)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
155 self.f.seek(self.f_start + idx * self.elsize * self.readsize)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
156 return numpy.fromfile(self.f,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
157 dtype=self.magic_t,
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
158 count=self.readsize).reshape(self.returnshape)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
159
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
160
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
161 #
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
162 # TODO: implement item selection:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
163 # e.g. load('some mat', subtensor=(:6, 2:5))
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
164 #
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
165 # This function should be memory efficient by:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
166 # - allocating an output matrix at the beginning
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
167 # - seeking through the file, reading subtensors from multiple places
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
168 def read(f, subtensor=None, debug=False):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
169 """Load all or part of file 'f' into a numpy ndarray
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
170
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
171 @param f: file from which to read
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
172 @type f: file-like object
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
173
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
174 If subtensor is not None, it should be like the argument to
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
175 numpy.ndarray.__getitem__. The following two expressions should return
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
176 equivalent ndarray objects, but the one on the left may be faster and more
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
177 memory efficient if the underlying file f is big.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
178
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
179 read(f, subtensor) <===> read(f)[*subtensor]
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
180
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
181 Support for subtensors is currently spotty, so check the code to see if your
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
182 particular type of subtensor is supported.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
183
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
184 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
185 magic_t, elsize, ndim, dim, dim_size = _read_header(f,debug)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
186 f_start = f.tell()
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
187
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
188 rval = None
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
189 if subtensor is None:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
190 rval = numpy.fromfile(f, dtype=magic_t, count=_prod(dim)).reshape(dim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
191 elif isinstance(subtensor, slice):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
192 if subtensor.step not in (None, 1):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
193 raise NotImplementedError('slice with step', subtensor.step)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
194 if subtensor.start not in (None, 0):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
195 bytes_per_row = _prod(dim[1:]) * elsize
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
196 f.seek(f_start + subtensor.start * bytes_per_row)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
197 dim[0] = min(dim[0], subtensor.stop) - subtensor.start
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
198 rval = numpy.fromfile(f, dtype=magic_t, count=_prod(dim)).reshape(dim)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
199 else:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
200 raise NotImplementedError('subtensor access not written yet:', subtensor)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
201
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
202 return rval
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
203
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
204 def write(f, mat):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
205 """Write a numpy.ndarray to file.
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
206
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
207 @param f: file into which to write
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
208 @type f: file-like object
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
209
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
210 @param mat: array to write to file
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
211 @type mat: numpy ndarray or compatible
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
212
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
213 """
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
214 def _write_int32(f, i):
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
215 i_array = numpy.asarray(i, dtype='int32')
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
216 if 0: print 'writing int32', i, i_array
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
217 i_array.tofile(f)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
218
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
219 try:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
220 _write_int32(f, _dtype_magic[str(mat.dtype)])
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
221 except KeyError:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
222 raise TypeError('Invalid ndarray dtype for filetensor format', mat.dtype)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
223
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
224 _write_int32(f, len(mat.shape))
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
225 shape = mat.shape
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
226 if len(shape) < 3:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
227 shape = list(shape) + [1] * (3 - len(shape))
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
228 if 0: print 'writing shape =', shape
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
229 for sh in shape:
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
230 _write_int32(f, sh)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
231 mat.tofile(f)
faacc76d21c2 Basic new pipeline script for the images tranforms
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
232