哪些python模式可用于并行化?

时间:2018-11-20 12:23:45

标签: python multithreading python-2.7 parallel-processing joblib

cmd是一个处理参数x的函数,将输出打印到stdout。例如,可能是

def cmd(x):
  print(x)

调用cmd()的串行程序如下所示。

for x in array:
  cmd(x)

为了加快程序运行速度,我希望它可以并行运行。 stdout输出可以乱序,但是单个x的输出一定不能被另一个x的输出破坏。

有多种方法可以在python中实现。我发现这样的事情。

from joblib import Parallel, delayed
Parallel(n_jobs=100)(delayed(cmd)(i) for i in range(100))

就代码的简单性/可读性和效率而言,这是在python中实现此目的的最佳方法吗?

此外,以上代码在python3上运行正常。但不是在python2上,出现以下错误。这可能会导致错误吗?

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/joblib/externals/loky/backend/semlock.py:217:RuntimeWarning:信号灯在OSX上已损坏,发行版可能增加其最大值   “增加其最大值”,RuntimeWarning)

谢谢。

3 个答案:

答案 0 :(得分:1)

如果您使用的是python3,则可以改用标准库中的concurrent.futures

请考虑以下用法:

with concurrent.futures.ProcessPoolExecutor(100) as executor:
     for x in array:
         executor.submit(cmd, x)

答案 1 :(得分:1)

在标准库https://docs.python.org/3/library/threading.html

import threading

def cmd(x):
    lock.acquire(blocking=True)
    print(x)
    lock.release()

lock = threading.Lock()

for i in range(100):
    t = threading.Thread(target=cmd, args=(i,))
    t.start()

使用锁可以保证lock.acquire()lock.release()之间的代码一次仅由一个线程执行。 print方法在python3中已经是线程安全的,因此即使没有锁也不会中断输出。但是,如果您在线程(它们修改的对象)之间共享任何状态,则需要一个锁。

答案 2 :(得分:0)

我将使用以下代码解决问题中的问题(假设我们谈论的是CPU绑定操作):

import multiprocessing as mp
import random


def cmd(value):
    # some CPU heavy calculation
    for dummy in range(10 ** 8):
        random.random()
    # result
    return "result for {}".format(value)


if __name__ == '__main__':
    data = [val for val in range(10)]
    pool = mp.Pool(4)  # 4 - is the number of processes (the number of CPU cores used)
    # result is obtained after the process of all the data
    result = pool.map(cmd, data)

    print(result)

输出:

['result for 0', 'result for 1', 'result for 2', 'result for 3', 'result for 4', 'result for 5', 'result for 6', 'result for 7', 'result for 8', 'result for 9']

编辑-在计算后立即获得结果的另一种实现方式-processesqueues而不是poolmap

import multiprocessing
import random


def cmd(value, result_queue):
    # some CPU heavy calculation
    for dummy in range(10 ** 8):
        random.random()
    # result
    result_queue.put("result for {}".format(value))


if __name__ == '__main__':

    data = [val for val in range(10)]
    results = multiprocessing.Queue()

    LIMIT = 3  # 3 - is the number of processes (the number of CPU cores used)
    counter = 0
    for val in data:
        counter += 1
        multiprocessing.Process(
            target=cmd,
            kwargs={'value': val, 'result_queue': results}
            ).start()
        if counter >= LIMIT:
            print(results.get())
            counter -= 1
    for dummy in range(LIMIT - 1):
        print(results.get())

输出:

result for 0
result for 1
result for 2
result for 3
result for 4
result for 5
result for 7
result for 6
result for 8
result for 9