Question

我有一个整数的numpy数组。

我还有另外两个数组，分别代表该数组的开始和长度（或者可以是开始和结束）索引，这些索引标识了我需要处理的整数序列。序列是可变长度的。

x=numpy.array([2,3,5,7,9,12,15,21,27,101, 250]) #Can have length of millions

starts=numpy.array([2,7]) # Can have lengths of thousands
ends=numpy.array([5,9])

# required output is x[2:5],x[7:9] in flat 1D array 
# [5,7,9,12,21,27,101]

我可以使用for循环轻松地做到这一点，但是该应用程序对性能很敏感，因此我正在寻找一种无需Python迭代即可实现的方法。

任何帮助都会感激不尽！

道格

Answer 1

方法1

一种矢量化方法是通过广播创建屏蔽-

方法2

另一种矢量化方法是用累积量创建1和0的斜坡（对于许多起始端对应该更好），就像这样-

$user

方法3

另一个In [16]: r = np.arange(len(x)) In [18]: x[((r>=starts[:,None]) & (r<ends[:,None])).any(0)] Out[18]: array([ 5, 7, 9, 21, 27])对中有很多条目的情况下，另一个基于循环的实现内存效率更高的方法-

idx = np.zeros(len(x),dtype=int)
idx[starts] = 1
idx[ends[ends<len(x)]] = -1
out = x[idx.cumsum().astype(bool)]

方法4

为完整起见，这是另一个with循环，用于选择切片，然后将其分配到初始化的数组中，并且应该适合从大型数组中选择的切片-

starts,ends

如果迭代次数很多，则可以进行较小的优化以减少每次迭代的计算量-

mask = np.zeros(len(x),dtype=bool)
for (i,j) in zip(starts,ends):
    mask[i:j] = True
out = x[mask]

如何在不进行迭代的情况下基于成对的开始/结束索引定义numpy数组的多个切片？

1 个答案: