Question

我想测量满足某些条件的子阵列的长度（如停止时钟），但一旦条件不再满足，该值应重置为零。因此，结果数组应该告诉我，有多少值符合某些条件（例如，值> 1）：

[0, 0, 2, 2, 2, 2, 0, 3, 3, 0]

应该产生以下数组：

[0, 0, 1, 2, 3, 4, 0, 1, 2, 0]

可以在python中轻松定义一个函数，它返回相应的numy数组：

def StopClock(signal, threshold=1):

    clock = []
    current_time = 0
    for item in signal:
        if item > threshold:
            current_time += 1
        else:
            current_time = 0
        clock.append(current_time)
    return np.array(clock)

StopClock([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

但是，我真的不喜欢这种for循环，特别是因为这个计数器应该运行在更长的数据集上。我想到了一些与np.cumsum结合的np.diff解决方案，但是我没有通过重置部分。是否有人意识到上述问题的更优雅的numpy式解决方案？

Answer 1

此解决方案使用pandas执行groupby：

s = pd.Series([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
threshold = 0
>>> np.where(
        s > threshold, 
        s
        .to_frame()  # Convert series to dataframe.
        .assign(_dummy_=1)  # Add column of ones.
        .groupby((s.gt(threshold) != s.gt(threshold).shift()).cumsum())['_dummy_']  # shift-cumsum pattern
        .transform(lambda x: x.cumsum()), # Cumsum the ones per group.
        0)  # Fill value with zero where threshold not exceeded.
array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])

Answer 2

另一个愚蠢的解决方案：

import numpy as np
a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

def stop_clock(signal, threshold=1):
    mask = signal > threshold
    indices = np.flatnonzero(np.diff(mask)) + 1
    return np.concatenate(list(map(np.cumsum, np.array_split(mask, indices))))

stop_clock(a)
# array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])

Answer 3

是的，我们可以使用diff-styled differentiation和cumsum以矢量化的方式创建这样的交错斜坡，并且特别适用于大型输入数组。通过在每个间隔结束时分配适当的值来处理重置部分，其中包括在每个间隔结束时重置数字的累加概念。

这是实现所有目标的一种实现 -

def intervaled_ramp(a, thresh=1):
    mask = a>thresh

    # Get start, stop indices
    mask_ext = np.concatenate(([False], mask, [False] ))
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
    s0,s1 = idx[::2], idx[1::2]

    out = mask.astype(int)
    valid_stop = s1[s1<len(a)]
    out[valid_stop] = s0[:len(valid_stop)] - valid_stop
    return out.cumsum()

样品运行 -

Input (a) : 
[5 3 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) : 
[1 2 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) : 
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=1)) : 
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 1]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=0)) : 
[1 2 3 4 5 0 0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 4 0 1]

运行时测试

进行公平基准测试的一种方法是在问题中使用发布的样本并平铺很多次并将其用作输入数组。有了这个设置，这里有时间 -

In [841]: a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

In [842]: a = np.tile(a,10000)

# @Alexander's soln
In [843]: %timeit pandas_app(a, threshold=1)
1 loop, best of 3: 3.93 s per loop

# @Psidom 's soln
In [844]: %timeit stop_clock(a, threshold=1)
10 loops, best of 3: 119 ms per loop

# Proposed in this post
In [845]: %timeit intervaled_ramp(a, thresh=1)
1000 loops, best of 3: 527 µs per loop

根据阈值创建交错的渐变阵列 - Python / NumPy

3 个答案: