Question

我正在尝试在我的列表中使用并行处理，它具有以下格式：

[ (key , { three values } )]

我想将那些具有任何类似值的键分组：

示例：[ (1 , { 22,31,21 } ) , (2, {19,98,11}) , (3, {1,22,4})]

预期输出：(1,3) because { 22,31,21 }.intersection({1,22,4} ) have 22

现在我使用嵌套循环与并行处理进行基准测试：

from multiprocessing import Pool
from tqdm import tqdm

通过并行处理：

def check_values(value):
    result=[]
    for j in fake_data:
        if len(value[1].intersection(j[1]))!=0:
            if j[0]!=value[0]:
                result.append((value[0],j[0]))
    return result

if __name__ == "__main__":

    agents= 5
    chunksize=1

    with Pool(processes=agents) as pool:
        result=pool.map(check_values,fake_data,chunksize)

datar=[tuple(sorted(j)) for i in result for j in i]
print(len(sorted(set(datar))))

通过嵌套循环{serial processing}：

def without_pool():
    datar=[]

    for ivalue in tqdm(fake_data):
        for jvalue in fake_data:
            if len(ivalue[1].intersection(jvalue[1]))!=0:
                if ivalue[0]!=jvalue[0]:
                    datar.append((ivalue[0],jvalue[0]))
    return datar

sorted_result=[tuple(sorted(i)) for i in without_pool()]
print(len(sorted(set(sorted_result))))

我期待在并行方法中执行速度更快，但我在串行方法中并行速度更慢，速度更快，我在某处做错了请建议：

以下是notebook file，其中包含数据和所有代码，您可以运行它：

并行处理Python：为什么并行处理比串行处理慢？

0 个答案: