Question

我有两个列表，需要逐个元素地进行比较和计算。随着这些列表越来越大，性能也越来越差。有人建议将其中一个列表分成N个部分并并行运行比较。如何并行运行这些？

key={}
#compare each list, element by element
for i in range(len(list1)):
    for j in range(len(list2)):
        matched = False
        try:
            matched = match_function(list[i]['typeforma'], list[i]['typeformb'],list[j]['typeforma'], list[j]['typeformb'], ) 
        except:
            print("Error",i,j)
        if matched:
            # store two matches in the dictionary
            key[list2[j]['id']]=list1[i]['identifier']
            break;
        j+=1
    i+=1

Answer 1

假设您确实需要比较笛卡尔积（list1中的每个元素与list2中的每个元素，而不是仅仅将list1中的每个元素与相应的元素进行比较list2），最简单的方法就是通过map调用ProcessPoolExecutor或Pool来替换外部循环。

唯一的诀窍是你不想尝试分享那个可变的key字典;相反，传回单个dicts并在最后合并。

例如：

def compare_to_list2(i):
    key = {}
    for j in range(len(list2)):
        matched = False
        try:
            matched = match_function(list[i]['typeforma'], list[i]['typeformb'],list[j]['typeforma'], list[j]['typeformb'], ) 
        except:
            print("Error",i,j)
        if matched:
            # store two matches in the dictionary
            key[list2[j]['id']]=list1[i]['identifier']
            break;
        j+=1
    return key

with concurrent.futures.ProcessPoolExecutor as x:
    key = {}
    for result in x.map(compare_to_list2, range(len(list1)), chunksize=1024):
        key.update(result)

尝试使用chunksize，但首先，有一些方法可以改进这一点。仅举一个例子，您应该直接在list1和list2上进行迭代，而不是在range(len(list1))和range(len(list2))上进行迭代 - 并且这样做不仅会使事情更简单，而且也更有效，特别是大块的大块。事实上，通常最好先简化，然后进行优化。

Python：如何并行比较两个列表的作业？

1 个答案: