Question

我有一个值数组t，它总是按递增的顺序（但不总是均匀间隔）。我有另一个单值x。我需要在t中找到索引，使得t [index]最接近x。该函数必须为x＆lt; t.min（）和x>的最大索引（或-1） t.max（）。

我写了两个函数来做到这一点。在这个简单的定时测试中，第一个f1更快。但我喜欢第二个只是一行。此计算将在大型阵列上完成，可能每秒多次。

任何人都可以提出一些其他功能，与第一个功能相当，但代码更清晰吗？比第一个更快的东西怎么样（速度最重要）？

谢谢！

代码：

import numpy as np
import timeit

t = np.arange(10,100000)         # Not always uniform, but in increasing order
x = np.random.uniform(10,100000) # Some value to find within t

def f1(t, x):
   ind = np.searchsorted(t, x)   # Get index to preserve order
   ind = min(len(t)-1, ind)      # In case x > max(t)
   ind = max(1, ind)             # In case x < min(t)
   if x < (t[ind-1] + t[ind]) / 2.0:   # Closer to the smaller number
      ind = ind-1
   return ind

def f2(t, x):
   return np.abs(t-x).argmin()

print t,           '\n', x,           '\n'
print f1(t, x),    '\n', f2(t, x),    '\n'
print t[f1(t, x)], '\n', t[f2(t, x)], '\n'

runs = 1000
time = timeit.Timer('f1(t, x)', 'from __main__ import f1, t, x')
print round(time.timeit(runs), 6)

time = timeit.Timer('f2(t, x)', 'from __main__ import f2, t, x')
print round(time.timeit(runs), 6)

Answer 1

这似乎更快（对我来说，Python 3.2-win32，numpy 1.6.0）：

from bisect import bisect_left
def f3(t, x):
    i = bisect_left(t, x)
    if t[i] - x > 0.5:
        i-=1
    return i

输出：

[   10    11    12 ..., 99997 99998 99999]
37854.22200356027
37844
37844
37844
37854
37854
37854
f1 0.332725
f2 1.387974
f3 0.085864

Answer 2

使用searchsorted：

t = np.arange(10,100000)         # Not always uniform, but in increasing order
x = np.random.uniform(10,100000)

print t.searchsorted(x)

修改

啊，是的，我看到你在f1做的事情。也许下面的f3比f1更容易阅读。

def f3(t, x): ind = t.searchsorted(x) if ind == len(t): return ind - 1 # x > max(t) elif ind == 0: return 0 before = ind-1 if x-t[before] < t[ind]-x: ind -= 1 return ind

Answer 3

np.searchsorted是二分搜索（每次将数组拆分为一半）。所以你必须以一种方式实现它，它返回小于x的最后一个值，而不是返回零。

查看此算法（来自here）：

def binary_search(a, x):
    lo=0
    hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return lo-1 if lo > 0 else 0

刚刚替换了最后一行（return -1）。也改变了论点。

由于循环是用Python编写的，可能比第一个慢...（未经基准测试）

Python / Numpy - 快速查找最接近某些值的数组中的索引

3 个答案: