Question

我有两个列表L和C，它们从最小到最大排序。 L包含正整数，C包含正整数和正小数(e.g. 0.01,0.05,..,100)。 C的长度固定为6000+，L的长度是变量(between 2 and 3000)。

目标是：给定一些常量M，从l中找到L，从c中找到C。 l*c<=M，并尽可能靠近M。

当前，我正在使用C上的for循环和列表L上的二进制搜索来找出最大的l*c，即<=M。但是，它非常慢。

candidate_list = []
for c in C:
    binary search on list L using while loop to find out the best l*c<=M
    candidate_list.append(best l*c)
print(max(candidate_list))

给定L的长度为N，使用二进制搜索将花费logN。但是，由于C的长度为6000+，因此c上的for循环会很慢。而且，如果我有多个长度不同的列表L，则使用for循环将非常慢。我可以知道是否有任何numpy或scipy软件包可以加快计算速度？

注意：由于我有很多列表L，因此我不能简单地在L和C _ transpose之间进行numpy矩阵乘法并使用argmax来找出最大l*c，即<=M。

Answer 1

因为两个列表都已排序，所以使用 linear 算法就足够了：

向前遍历一个列表，从第二个列表中找到item[A]的最佳对（例如索引K）

对于下一个item[A+1]配对的项，其索引肯定比前一个（K）小或相等，因此您只需一个即可浏览第二个列表。

伪代码：

 iL = len(L)-1
 for iC in range(len(C)):
     while L[iL] * C[iC] > M:
          iL -= 1
     use pair  L[iL], C[iC]

Answer 2

@Mbo用户在his answer中提出了要点：

向前遍历一个列表，并从第二个列表中找到item[A]的最佳配对，但从第二个列表的后面开始搜索。对于下一个item[A+1]，其配对项目肯定必须小于或等于上一个（K）的索引，因此您只需要在第二个列表中运行一个即可。

这是他提供的伪代码的示例实现（线性时间，与最大列表的长度绑定，该列表的最大长度为问题中的列表C）：

def find(list_c, list_l, threshold):
    # all pairs of elements whose product is smaller than 'threshold'
    possible_pairs = []

    j = len(list_l) - 1
    for i in range(len(list_c)):
        while list_c[i] * list_l[j] > threshold:
            # product is too big, pick a smaller element from 'list_l'
            j -= 1

            if j < 0:
                # exit while loop
                break

        if j < 0:
            # exit for loop
            break

        # we store some extra info here
        possible_pairs.append({
            'c_index': i,
            'c_elem': list_c[i],
            'l_index': j,
            'l_elem': list_l[j],
            'product': list_c[i] * list_l[j],
        })

    print(possible_pairs)

    # return the pair with the biggest product (closest to threshold)
    return max(
        possible_pairs,
        key=lambda x: x['product'])

我也测试了此解决方案：

import random

list_c = list(sorted(random.random()*100 for i in range(100)))
list_l = list(sorted(random.random()*100 for i in range(20)))
print('list_c', list_c)
print('list_l', list_l)

elem = find(list_c, list_l, threshold=50)

print('the best pair is')
print(elem)

最后的打印输出如下：

{
    'c_index': 47,
    'c_elem': 46.42324820342966,
    'l_index': 0,
    'l_elem': 1.0709460533705695,
    'product': 49.716794448105375,
}

如您所见，类似的解决方案可用于按问题中提到的许多L列表顺序计算搜索。

Answer 3

numba软件包。它是专门为加快python循环速度而设计的。

在其网站上：Numba使用行业标准的LLVM编译器库在运行时将Python函数转换为优化的机器代码。 Python中用Numba编译的数值算法可以达到C或FORTRAN的速度。

是否有任何python软件包可以加快循环计算速度？

3 个答案: