Question

说我有一个像这样的np.array：

a = [1, 3, 4, 5, 60, 43, 53, 4, 46, 54, 56, 78]

是否有一种快速的方法来获取3个连续数字都超过某个阈值的所有位置的索引？也就是说，对于某个阈值th，获得所有x所在的位置：

a[x]>th and a[x+1]>th and a[x+2]>th

示例：对于阈值40和上面给出的列表，x应该为[4,8,9]。

非常感谢。

Answer 1

方法1

在比较后获得的布尔数组的掩码上使用convolution-

In [40]: a # input array
Out[40]: array([ 1,  3,  4,  5, 60, 43, 53,  4, 46, 54, 56, 78])

In [42]: N = 3 # compare N consecutive numbers

In [44]: T = 40 # threshold for comparison

In [45]: np.flatnonzero(np.convolve(a>T, np.ones(N, dtype=int),'valid')>=N)
Out[45]: array([4, 8, 9])

方法2

使用binary_erosion-

In [77]: from scipy.ndimage.morphology import binary_erosion

In [31]: np.flatnonzero(binary_erosion(a>T,np.ones(N, dtype=int), origin=-(N//2)))
Out[31]: array([4, 8, 9])

方法3（特殊情况）：检查少量连续数字

要检查少量的连续数字（在这种情况下为3），我们也可以在比较的掩码上slicing以取得更好的性能-

m = a>T
out = np.flatnonzero(m[:-2] & m[1:-1] & m[2:])

基准化

给定样本中100000重复/平铺数组上的时间-

In [78]: a
Out[78]: array([ 1,  3,  4,  5, 60, 43, 53,  4, 46, 54, 56, 78])

In [79]: a = np.tile(a,100000)

In [80]: N = 3

In [81]: T = 40

# Approach #3
In [82]: %%timeit
    ...: m = a>T
    ...: out = np.flatnonzero(m[:-2] & m[1:-1] & m[2:])
1000 loops, best of 3: 1.83 ms per loop

# Approach #1
In [83]: %timeit np.flatnonzero(np.convolve(a>T, np.ones(N, dtype=int),'valid')>=N)
100 loops, best of 3: 10.9 ms per loop

# Approach #2    
In [84]: %timeit np.flatnonzero(binary_erosion(a>T,np.ones(N, dtype=int), origin=-(N//2)))
100 loops, best of 3: 11.7 ms per loop

Answer 2

尝试：

th=40
results = [ x for x in range( len( array ) -2 )  if(array[x:x+3].min() > th) ]

这是

的列表理解

th=40
results = []
for x in range( len( array ) -2 ):
    if( array[x:x+3].min() > th ):
        results.append( x )

Answer 3

使用numpy.lib.stride_tricks.as_strided的另一种方法：

in [59]: import numpy as np

In [60]: from numpy.lib.stride_tricks import as_strided

定义输入数据：

In [61]: a = np.array([ 1,  3,  4,  5, 60, 43, 53,  4, 46, 54, 56, 78])

In [62]: N = 3

In [63]: threshold = 40

计算结果； q是“大”值的布尔掩码。

In [64]: q = a > threshold

In [65]: result = np.all(as_strided(q, shape=(len(q)-N+1, N), strides=(q.strides[0], q.strides[0])), axis=1).nonzero()[0]

In [66]: result
Out[66]: array([4, 8, 9])

再次使用N = 4：

In [67]: N = 4

In [68]: result = np.all(as_strided(q, shape=(len(q)-N+1, N), strides=(q.strides[0], q.strides[0])), axis=1).nonzero()[0]

In [69]: result
Out[69]: array([8])

检查数组中是否有3个连续值超过某个阈值

3 个答案:

基准化