我在PyTables中有一个大的CArray整数(100万行,50,000列):
In [22]: fmat
Out[22]:
/fmat (CArray(1025461, 54123), shuffle, blosc(5)) ''
atom := Int32Atom(shape=(), dflt=0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := (9, 54123)
选择行很好:
In [24]: %timeit fmat[0]
10000 loops, best of 3: 46.5 µs per loop
但是选择列需要永远:
In [25]: %timeit fmat[:,0]
1 loop, best of 3: 25 s per loop
是否有一种有效的方法来索引列或转置数组,以允许快速切片?
答案 0 :(得分:0)
The answer is in the chunkshape
parameter when creating the array.
If one only needs column slices, just set the column as the chunkshape. For example, for an NxP matrix (N rows and P columns), choose:
fmat = f.create_carray(f.root, 'fmat', tb.Int32Atom(),
shape=(N, P), filters=filters,
chunkshape=[N,1])