multiprocessing.Process与原始循环相同的速度?

时间:2016-09-21 17:23:23

标签: python multithreading multiprocessing

假设我有一个功能:

def support1(items, rows, output):
    n_rows = len(rows)
    if type(items) is list or type(items) is set:
        count = float(sum([1 for row in rows if all(item in row.split() for item in items)])) 
    elif type(items) is str:
        count = float(sum([1 for row in rows if all(item in row.split() for item in items.split())])) 
    res = count/n_rows
    output.put(res)

我想在50,000对项目列表上运行此功能,如下所示:

all_items = [['apple', 'banana'], ['apple', 'fruit'],
            ['apple', 'pear'], ['banana', 'pear'], ...]

我要完成10,000笔交易:

transactions = ['apple banana pear peach cream', 'apple banana pear', 'pear apple apple banana', 'pear banana', 'banana', 'apple', ...]

所以,为了计算这些对的频率,我写了这样的东西:

supports = [support1(pair, transactions, output) for pair in all_items]

这显然需要我的机器(显然)。我无法将transactions转换为set。我正在尝试启动一些并行进程,但这些进程与counts赋值理解一样长。这是我的并行代码:

import multiprocessing
output = mp.Queue()

processes = [multiprocessing.Process(target = support1, args = (pair, transactions, output)) for pair in all_items]

for p in processes:
    p.start()

最后的for循环是永远需要的...我错过了这个multiprocessing模块的东西吗?我以前做过并行处理,并没有那么糟糕。

0 个答案:

没有答案