Question

给定1维数组值：

A = [x，..，x，0，..，0，x，..，x，0，..，0，x，..，x，........]

其中：

x，..，x代表任意数量的任意值

和

0，..，0代表任意数量的零

我需要找到一个快速算法来查找边界的索引即：..，x，0，..和..，0，x ..

这个问题似乎有助于并行化，但这超出了我的经验，简单的数组循环就是因为数据要大而缓慢

THX 马丁

Answer 1

@chthonicdaemon的回答可以让你获得90％的回合，但是如果你真的想使用索引来切断阵列，你需要一些额外的信息。

据推测，您希望使用标记来提取阵列中不是0的区域。您已经找到了阵列发生变化的索引，但您不知道是否有变化是从True到False或相反的方式。因此，您需要检查第一个和最后一个值并进行相应调整。否则，在某些情况下，您将最终提取零段而不是数据。

例如：

import numpy as np

def contiguous_regions(condition):
    """Finds contiguous True regions of the 1D boolean array "condition".
    Returns a 2D array where the first column is the start index of the region
    and the second column is the end index."""
    # Find the indicies of changes in "condition"
    idx = np.flatnonzero(np.diff(condition)) + 1

    # Prepend or append the start or end indicies to "idx"
    # if there's a block of "True"'s at the start or end...
    if condition[0]:
        idx = np.append(0, idx)
    if condition[-1]:
        idx = np.append(idx, len(condition))

    return idx.reshape(-1, 2)

# Generate an example dataset...
t = np.linspace(0, 4*np.pi, 20)
x = np.abs(np.sin(t)) + 0.1
x[np.sin(t) < 0.5] = 0

print x

# Get the contiguous regions where x is not 0
for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

所以在这种情况下，我们的示例数据集看起来像：

array([ 0.        ,  0.71421271,  1.06940027,  1.01577333,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.93716648,  1.09658449,  0.83572391,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ])

通过做：

for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

我们得到：

[ 0.71421271  1.06940027  1.01577333]
[ 0.93716648  1.09658449  0.83572391]

Answer 2

这应该至少将循环推送到Numpy原语中，尽管它将遍历数组三次：

A = 2*(rand(200000)>0.2)  # testing data
borders = flatnonzero(diff(A==0))

我的电脑需要1.79毫秒。

Numpy检测区域边界

2 个答案: