PyTables中的快速列切片

时间:2016-10-04 16:23:04

标签: python arrays slice transpose pytables

我在PyTables中有一个大的CArray整数(100万行,50,000列):

In [22]: fmat
Out[22]: 
/fmat (CArray(1025461, 54123), shuffle, blosc(5)) ''
  atom := Int32Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (9, 54123)

选择行很好:

In [24]: %timeit fmat[0]
10000 loops, best of 3: 46.5 µs per loop

但是选择列需要永远:

In [25]: %timeit fmat[:,0]
1 loop, best of 3: 25 s per loop

是否有一种有效的方法来索引列或转置数组,以允许快速切片?

1 个答案:

答案 0 :(得分:0)

The answer is in the chunkshape parameter when creating the array.

If one only needs column slices, just set the column as the chunkshape. For example, for an NxP matrix (N rows and P columns), choose:

fmat = f.create_carray(f.root, 'fmat', tb.Int32Atom(), shape=(N, P), filters=filters, chunkshape=[N,1])