Question

我正在尝试编写一个Python程序，该程序使用来自眼动追踪设备的输入数据并检查它是否在给定范围内。输入是标准化值，对应于凝视的x位置。范围始终预先排序。我需要检查这个位置x是否在2D数组中任何元素对的边界内，并且如果是这种情况则运行函数。类似的东西：

x = 0.23 # input variable
boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25]]

for i, pair in enumerate(boundaries):
    if x >= pair[0] and x <= pair[1]:
        print(i) # some function

现在，问题是输入x是以60Hz发射的实时数据，边界有时可能是长列表（1000个元素），因此这种方法每秒将有数十万次检查。什么是最有效的计算方法？我想，或许有一个很好的矢量化版本在numpy但我在微积分方面很差。

我已经运行了一个测试，以确定@Meitham在答案中发布的解决方案是否给我一个显着的差异，但我使用Python方法获得：

import numpy as np

n = 10000
b1 = np.linspace(0.0, 1.0, n)
boundaries = [[b1[i], b1[i] + 0.01] for i in range(n)]
x = 0.23
final = []

for i, pair in enumerate(boundaries):
    if x >= pair[0] and x <= pair[1]:
        final.append(pair)

100000000 loops, best of 3: 0.0121 usec per loop

和numpy方法：

import numpy as np

n = 10000
b1 = np.linspace(0.0, 1.0, n)
boundaries = [[b1[i], b1[i] + 0.01] for i in range(n)]
x = 0.23
a = np.array(boundaries)
final = a[(a[...,0] < x) & (a[...,1] > x)]

100000000 loops, best of 3: 0.0122 usec per loop

所以我认为这两种方法之间没有任何有意义的区别。也许我正在以错误的方式测试它？

Answer 1

x = 0.23 boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25]] filter_list = [item for item in boundaries if x >= item[0] and x <= item[1] ] print(filter_list)

Answer 2

ser=[]
i = 0
while i < len(df["Price"]):
    if i==0:
        ser.append(0)
    elif df["Price"][i]-df["Price"][i-1]>0:
        ser.append((df["Price"][i]-df["Price"][i-1]))
    else:
        ser.append(0)
    i = i+1
df["up"]=ser

这使用numpy，如果边界反转到::

，则可以工作

>>> import numpy as np    
>>> boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25], [1.0, 0.5]]
>>> x = 0.23

如果边界点（x，y）保证为（x

>>> a = np.array(boundaries)
>>> (np.min(a, 1) < x) & (x < np.max(a, 1))
array([False, False, False,  True, False], dtype=bool)
>>> a[(small < x) & (x < large)]
array([[ 0.15,  0.25]])

没有>>> a[(a[...,0] < x) & (a[...,1] > x)] array([[ 0.15, 0.25]])的纯python解决方案可能如下所示::

numpy

对于小序列，python解决方案可能看起来很快，如示例所示，但正如您在问题中所述，边界序列很大时，>>> [(low, high) for (low, high) in boundaries if high <= x <= low]会闪耀。

numpy

然而，使用更大的只有1000个元素的序列::

>>> %timeit [(low, high) for (low, high) in boundaries if high <= x <= low]
The slowest run took 26.73 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 419 ns per loop

>>> %timeit a[(a[...,0] < x) & (a[...,1] > x)]
The slowest run took 26.76 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.97 µs per loop

>>> %timeit (np.min(a, 1) < x) & (x < np.max(a, 1))
The slowest run took 40.57 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.22 µs per loop

检查2d数组中定义的边界内的值的有效方法

2 个答案: