更快

Question

假设我有一个次数。我知道a-priori最大时间是1，比如说，所以数组可能看起来像

events = [0.1, 0.2, 0.7, 0.93, 1.37]

该数组中的数字表示在时间间隔[0,1]中发生事件的时间（并且我忽略了大于1的任何事件）。我不知道数组大小的先验，但我的大小确实有合理的上限（如果这很重要），所以我甚至可以在需要时安全地截断它。

我需要将该数组转换为一个数组，该数组计算事件的数量直到时间x，其中x是时间间隔内的一组均匀间隔的数字linspace ）。因此，例如，如果该数组的粒度（=大小）是7，我的函数的结果应该是：

def count_events(events, granularity):
    ...

>>> count_events([0.1, 0.2, 0.7, 0.93, 1.37], 7)
array([0, 1, 2, 2, 2, 3, 4])
# since it checks at times 0, 1/6, 1/3, 1/2, 2/3, 5/6, 1.

我正在寻找一种有效的解决方案。在这里制作循环可能非常简单，但我的事件数组可能很大。实际上，它们不是1D而是2D，并且这种计数操作应该是每轴（与许多其他numpy函数一样）。更确切地说，这是一个2D示例：

def count_events(events, granularity, axis=None):
    ...

>>> events = array([[0.1, 0.2, 0.7, 0.93, 1.37], [0.01, 0.01, 0.9, 2.5, 3.3]])
>>> count_events(events, 7, axis=1)
array([[0, 1, 2, 2, 2, 3, 4],
       [0, 2, 2, 2, 2, 2, 3]])

Answer 1

您只需使用np.searchsorted -

即可

np.searchsorted(events, d) # with events being a 1D array

，其中d是linspaced数组，就像这样创建 -

d = np.linspace(0,1,7) # 7 being the interval size

2D案例 -

的示例运行

In [548]: events
Out[548]: 
array([[ 0.1 ,  0.2 ,  0.7 ,  0.93,  1.37],
       [ 0.01,  0.01,  0.9 ,  2.5 ,  3.3 ]])

In [549]: np.searchsorted(events[0], d) # Use per row
Out[549]: array([0, 1, 2, 2, 2, 3, 4])

In [550]: np.searchsorted(events[1], d)
Out[550]: array([0, 2, 2, 2, 2, 2, 3])

使用searchsorted：searchsorted2d的矢量化版本，我们甚至可以对整个事物进行矢量化并一次性使用所有行，就像这样 -

In [552]: searchsorted2d(events,d)
Out[552]: 
array([[0, 1, 2, 2, 2, 3, 4],
       [0, 2, 2, 2, 2, 2, 3]])

Answer 2

鉴于您的数组已经排序，想到一个比线性更好的想法是对每个均匀间隔的值进行二分搜索。这样，您可以每次检索数组中最右侧的索引，以使此索引处的值大于或等于搜索的值。使用内置bisect模块中的python bisect_right函数可以非常有效地完成此操作。

bisect(a, x)返回一个插入点，该插入点位于

中x的任何现有条目之后（右侧）

示例代码可以像

import numpy as np
from bisect import bisect_right
# define your_array somehow
N = 10 # the number of time intervals
lin_vals = np.linspace(0., 1., N)
counts = []
for i in range(your_array.shape[0]):
    row = your_array[i]
    tmp = [] # the counts for this row
    tot = 0
    for v in lin_vals:
        idx = bisect_right(row, v)
        tmp.append(tot+idx)
        tot += idx
    counts.append(tmp)

我还没有测试过这段代码，但是它应该给你一般的想法。这样做你将具有大约R*T*log(N)的复杂度，其中R是行数，T是时间间隔的数量，N是数组的大小。

更快

如果仍然不够快，请考虑裁剪数组行以删除大于1的值。

接下来，您可以通过将搜索下一个linspaced值限制为row[prev_idx:]来加快二进制搜索速度，从而提高速度。

您还可以尝试通过重新实现bisect_right来获得速度，以返回它已找到的上限idx，使得此索引处的值严格大于您将要处理的下一个lin间隔值。这样你可以限制两边的行，甚至更快！

将事件数组（事件数）转换为多个事件数组，直到时间x

2 个答案:

更快