Question

我有一段代码执行以下操作：

从索引b_lst

indx

检查此float是否位于索引i的float和列表i+1中的下一个（索引a_lst）之间
如果是，则将indx存储在第三个列表（c_lst）的子列表中，其中该子列表的索引是{{1}中左浮点的索引}（即：a_lst）
重复i

这是一个显示代码功能的b_lst：

MWE

此代码可以正常工作，但我确实需要提高其性能，因为它会减慢我的其余代码。

添加

这是基于接受的答案的优化功能。这很难看，但它完成了工作。

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N).tolist()

# Data lists.
# Note that a_lst is sorted.
a_lst = np.sort(random_data(1000))
b_lst = random_data(5000)
# Fixed index value (int)
c = 25

def func():
    # Create empty list with as many sub-lists as elements present
    # in a_lst beyond the 'c' index.
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]

    # For each element in b_lst.
    for indx,elem in enumerate(b_lst):

        # For elements in a_lst beyond the 'c' index.
        for i in range(len(a_lst[c:])-1):

            # Check if 'elem' is between this a_lst element
            # and the next.
            if a_lst[c+i] < elem <= a_lst[c+(i+1)]:

                # If it is then store the index of 'elem' ('indx')
                # in the 'i' sub-list of c_lst.
                c_lst[i].append(indx)

    return c_lst

print func()
# time function.
func_time = timeit.timeit(func, number=10)
print func_time

在我的测试中，这比原始功能快约7倍。

添加2

更快地不使用def func_opt(): c_lst = [[] for _ in range(len(a_lst[c:])-1)] c_opt = np.searchsorted(a_lst[c:], b_lst, side='left') for elem in c_opt: if 0<elem<len(a_lst[c:]): c_lst[elem-1] = np.where(c_opt==elem)[0].tolist() return c_lst：

np.where

这比原始功能快约130倍。

添加3

按照jtaylor的建议，我将def func_opt2(): c_lst = [[] for _ in range(len(a_lst[c:])-1)] c_opt = np.searchsorted(a_lst[c:], b_lst, side='left') for indx,elem in enumerate(c_opt): if 0<elem<len(a_lst[c:]): c_lst[elem-1].append(indx) return c_lst的结果转换为np.searchsorted的列表：

.tolist()

这比原始功能快〜470倍。

Answer 1

你想看看numpy的searchsorted。调用

np.searchsorted(a_lst, b_lst, side='right')

将返回一个索引数组，其长度与b_lst相同，保持在a_lst中的项目之前应该插入它们以保留顺序。它将非常快，因为它使用二进制搜索并且循环发生在C中。然后您可以创建具有花式索引的子阵列，例如：

>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226,  0.40043875,  0.62398925,  0.40097674,  0.58765603,
        0.14045264,  0.16990249,  0.78264088,  0.51507254,  0.31808327,
        0.03895417,  0.92130027])
>>> b[c == 1]
array([ 1.34599709,  1.42645778,  1.13025996,  1.20096723,  1.75724448,
        1.87447058,  1.23422399,  1.37807553,  1.64118058,  1.53740299])

加快列表之间浮动的比较

1 个答案: