Question

假设我有一个带有一些嘈杂数据系列的1d numpy数组。

我想建立一个阈值来检查值高和低时的值。但是，因为数据有噪音，所以只做

是没有意义的

is_high = data > threshold

我试图设置这个阈值的容差，就像许多控制系统那样（例如大多数加热和空调系统）。我们的想法是，当超过阈值加上公差时，信号的状态仅从低变为高。同样，如果信号低于阈值减去容差，信号将仅从高变为低。换句话说：

def tolerance_filter(data, threshold, tolerance):
    currently_high = False  # start low
    signal_state = np.empty_like(data, dtype=np.bool)
    for i in range(data.size):
        # if we were high and we are getting too low, become low
        if currently_high and data[i] < (threshold-tolerance):
            currently_high = False
        # if we were low and are getting too high, become high
        elif not currently_high and data[i] > (threshold+tolerance):
            currently_high = True
        signal_state[i] = currently_high
    return signal_state

此函数提供我期望的输出。但是，我想知道是否有任何方法可以使用numpy或scipy而不是原始python for循环的速度来执行此操作。

有什么想法吗？：）

UPDATE：

感谢Joe Kington的评论指出我 hysteresis 一词，我发现this other问题。我担心它非常相似（重复？），Bas Swinckels也有one nice working solution。

无论如何，我试图实现Joe Kington建议的加速（不知道我是否做得对）并将他的解决方案Fergal's和Bas'与我的天真方法进行了比较。以下是结果（代码如下）：

Proposed function in my original question
10 loops, best of 3: 22.6 ms per loop

Proposed function by Fergal
1000 loops, best of 3: 995 µs per loop

Proposed function by Bas Swinckels in the hysteresis question
1000 loops, best of 3: 1.05 ms per loop

Proposed function by Joe Kington using Cython
Approximate time cost of compiling: 2.195411
1000 loops, best of 3: 1.35 ms per loop

答案中的所有方法都表现相似（尽管Fergal需要一些额外的步骤才能获得布尔矢量！）。在这里添加任何考虑因素？另外，我很惊讶Cython方法较慢（虽然稍微有点）。无论如何，我不得不承认，如果你不了解所有numpy功能，它可能是最快的代码......

以下是我用来对不同选项进行基准测试的代码。审核和修订非常受欢迎！：P （Cython代码位于中间，强制SO将所有代码保存在同一个可滚动的块中。当然我将它放在不同的文件中）

# Naive approach from the original question
def tolerance_filter1(data, threshold, tolerance):
    currently_high = False  # start low
    signal_state = np.empty_like(data, dtype=np.bool)
    for i in range(data.size):
        # if we were high and we are getting too low, become low
        if currently_high and data[i] < (threshold-tolerance):
            currently_high = False
        # if we were low and are getting too high, become high
        elif not currently_high and data[i] > (threshold+tolerance):
            currently_high = True
        signal_state[i] = currently_high
    return signal_state

# Numpythonic approach suggested by Fergal
def tolerance_filter2(data, threshold, tolerance):
    a = np.zeros_like(data)
    a[ data < threshold-tolerance] = -1
    a[ data > threshold+tolerance] = +1
    wh = np.where(a != 0)[0]
    idx= np.diff( a[wh]) == 2
    #This variable indexes the values of data where data crosses
    #from below threshold-tol to above threshold+tol
    crossesAboveThreshold = wh[idx]
    return crossesAboveThreshold

# Approach suggested by Bas Swinckels and borrowed
# from the hysteresis question
def tolerance_filter3(data, threshold, tolerance, initial=False):
    hi = data >= threshold+tolerance
    lo_or_hi = (data <= threshold-tolerance) | hi
    ind = np.nonzero(lo_or_hi)[0]
    if not ind.size: # prevent index error if ind is empty
        return np.zeros_like(x, dtype=bool) | initial
    cnt = np.cumsum(lo_or_hi) # from 0 to len(x)
    return np.where(cnt, hi[ind[cnt-1]], initial)

#########################################################
## IN A DIFFERENT FILE (tolerance_filter_cython.pyx)
## So that StackOverflow shows a single scrollable code block :)

import numpy as np
import cython

@cython.boundscheck(False)
def tolerance_filter(data, float threshold, float tolerance):
    cdef bint currently_high = 0  # start low
    signal_state = np.empty_like(data, dtype=int)
    cdef double[:] data_view = data
    cdef long[:] signal_state_view = signal_state
    cdef int i = 0
    cdef int l = len(data)
    low = np.empty_like(data, dtype=bool)
    high = np.empty_like(data, dtype=bool)
    low = data < (threshold-tolerance)
    high = data > (threshold+tolerance)

    for i in range(l):
        # if we were high and we are getting too low, become low
        if currently_high and low[i]:
            currently_high = False
        # if we were low and are getting too high, become high
        elif not currently_high and high[i]:
            currently_high = True
        signal_state_view[i] = currently_high
    return signal_state

##################################################################
# BACK TO THE PYTHON FILE

import numpy as np
from time import clock
from datetime import datetime
from IPython import get_ipython
ipython = get_ipython()
time = np.arange(0,1000,0.01)
data = np.sin(time*3) + np.cos(time/7)*8 + np.random.normal(size=time.shape)*2
threshold, tolerance = 0, 4

print "Proposed function in my original question"
ipython.magic("timeit tolerance_filter1(data, threshold, tolerance)")

print "\nProposed function by Fergal"
ipython.magic("timeit tolerance_filter2(data, threshold, tolerance)")

print "\nProposed function by Bas Swinckels in the hysteresis question"
ipython.magic("timeit tolerance_filter3(data, threshold, tolerance)")

print "\nProposed function by Joe Kington using Cython"
start = datetime.now()
import pyximport; pyximport.install()
import tolerance_filter_cython
print "Approximate time cost of compiling: {}".format((datetime.now()-start).total_seconds())
tolerance_filter4 = tolerance_filter_cython.tolerance_filter
ipython.magic("timeit tolerance_filter4(data, threshold, tolerance)")

Answer 1

我认为有时会看到像cython extensions这样简单易用的Python有些令人惊讶。这是你的代码转换成cython。它可以从Python调用，但应该给你C ++速度。

$json = FROM THE SERVER;
$obj = json_decode($json);
$res = $obj->["objective"];
echo $res;

有几点需要注意：

请注意函数在typed memory views
该功能有意保持尽可能接近原始代码。但是，可以通过关闭范围检查（参考Cython文档），以及计算
循环的上限和下限有效阈值来加速它。

Answer 2

我不确定这比你的解决方案更好，但是它是否更加numpythonic。

a = np.zeros_like(data)
a[ data < threshold-tol] = -1
a[ data > threshold+tol] = +1
wh = np.where(a != 0)
idx= np.diff( a[wh]) == 2
#This variable indexes the values of data where data crosses
#from below threshold-tol to above threshold+tol
crossesAboveThreshold = wh[idx]

该过滤器具有numpy 1d阵列的容差

UPDATE：

2 个答案: