最简单(最少的样板代码)的方法来并行化python循环?

时间:2019-05-24 23:30:34

标签: python python-3.x multithreading python-3.7

我有一些看起来像这样的代码:

for photo in photoInfo:
    if not('url' in photo):
        raise Exception("Missing URL: " + str(photo) + " in " + str(photoInfo))
    sizes = getImageSizes(photo['url'])
    photo.update(sizes)

这可能并不明显,但是代码对每张照片执行了高延迟I / O(打开远程URL)和中等CPU密集型过程(解析图像并提取尺寸)的组合。

并行处理此代码的最简单方法是什么?

到目前为止我尝试过的事情

我在另一个more complex question的答案中找到了这段代码,但是我很难将其映射回我更简单的用例:

from itertools import product
from multiprocessing import Pool

with Pool(processes=4) as pool:  # assuming Python 3
    pool.starmap(print, product(range(2), range(3), range(4)))

2 个答案:

答案 0 :(得分:0)

您可以使用Pool.map并行获取图像大小,并使用返回值和相同的键构建新的字典:

from multiprocessing import Pool

def get_image_size(photo):
    if 'url' not in photo:
        raise Exception("Missing URL: " + str(photo))
    return getImageSizes(photo['url'])

if __name__ == '__main__':
    with Pool() as pool:
        photoInfo = dict(zip(photoInfo, pool.map(get_image_size, photoInfo)))

答案 1 :(得分:0)

from multiprocessing import Pool
import os

def user_defined_function(url):
    #your logic for a single url
    pass

if __name__ == '__main__':
    urls_list = ['u1','u2']
    pool = Pool(os.cpu_count())                         # Create a multiprocessing pool
    pool.map(user_defined_function, urls_list)

它是示例代码,您可以根据自己的用法进行修改。我将列表的每个元素映射到您的函数并分别执行。