用于多个参数的Python多处理pool.map

时间:2011-03-26 14:23:11

标签: python multiprocessing

在Python多处理库中,是否存在支持多个参数的pool.map变体?

text = "test"
def harvester(text, case):
    X = case[0]
    text+ str(X)

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=6)
    case = RAW_DATASET
    pool.map(harvester(text,case),case, 1)
    pool.close()
    pool.join()

22 个答案:

答案 0 :(得分:396)

  

是否有pool.map的变体支持多个参数?

Python 3.3包括pool.starmap() method

#!/usr/bin/env python3
from functools import partial
from itertools import repeat
from multiprocessing import Pool, freeze_support

def func(a, b):
    return a + b

def main():
    a_args = [1,2,3]
    second_arg = 1
    with Pool() as pool:
        L = pool.starmap(func, [(1, 1), (2, 1), (3, 1)])
        M = pool.starmap(func, zip(a_args, repeat(second_arg)))
        N = pool.map(partial(func, b=second_arg), a_args)
        assert L == M == N

if __name__=="__main__":
    freeze_support()
    main()

对于旧版本:

#!/usr/bin/env python2
import itertools
from multiprocessing import Pool, freeze_support

def func(a, b):
    print a, b

def func_star(a_b):
    """Convert `f([1,2])` to `f(1,2)` call."""
    return func(*a_b)

def main():
    pool = Pool()
    a_args = [1,2,3]
    second_arg = 1
    pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))

if __name__=="__main__":
    freeze_support()
    main()

输出

1 1
2 1
3 1

注意这里使用itertools.izip()itertools.repeat()的方式。

由于the bug mentioned by @unutbu您无法在Python 2.6上使用functools.partial()或类似功能,因此应明确定义简单的包装函数func_star()。另请参阅the workaround suggested by uptimebox

答案 1 :(得分:245)

答案取决于版本和情况。最近版本的Python(自3.3以来)最常见的答案首先由J.F. Sebastian描述。 1 它使用Pool.starmap方法,它接受一系列参数元组。然后它会自动从每个元组解包参数并将它们传递给给定的函数:

import multiprocessing
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap(merge_names, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...

对于早期版本的Python,您需要编写一个辅助函数来显式解包参数。如果您想使用with,您还需要编写一个包装器以将Pool转换为上下文管理器。 (感谢muon指出这一点。)

import multiprocessing
from itertools import product
from contextlib import contextmanager

def merge_names(a, b):
    return '{} & {}'.format(a, b)

def merge_names_unpack(args):
    return merge_names(*args)

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(merge_names_unpack, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...

在更简单的情况下,使用固定的第二个参数,您也可以使用partial,但仅限于Python 2.7 +。

import multiprocessing
from functools import partial
from contextlib import contextmanager

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(partial(merge_names, b='Sons'), names)
    print(results)

# Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ...

1。其中大部分都是受到他的回答的启发,而这应该可以被接受。但是由于这个问题一直处于最顶层,因此最好为未来的读者改进它。

答案 2 :(得分:121)

我认为以下会更好

def multi_run_wrapper(args):
   return add(*args)
def add(x,y):
    return x+y
if __name__ == "__main__":
    from multiprocessing import Pool
    pool = Pool(4)
    results = pool.map(multi_run_wrapper,[(1,2),(2,3),(3,4)])
    print results

输出

[3, 5, 7]

答案 3 :(得分:45)

Python 3.3 + pool.starmap():

一起使用
from multiprocessing.dummy import Pool as ThreadPool 

def write(i, x):
    print(i, "---", x)

a = ["1","2","3"]
b = ["4","5","6"] 

pool = ThreadPool(2)
pool.starmap(write, zip(a,b)) 
pool.close() 
pool.join()

结果:

1 --- 4
2 --- 5
3 --- 6

如果您愿意,还可以zip()更多参数:zip(a,b,c,d,e)

如果您希望将常量值作为参数传递,则必须使用import itertools,然后使用zip(itertools.repeat(constant), a)

答案 4 :(得分:22)

J.F. Sebastian回答中了解了itertools后,我决定更进一步,编写一个parmap包,负责并行化,提供mapstarmap函数在python-2.7和python-3.2(以及后来也可以)上,可以使用任意数量的位置参数。

安装

pip install parmap

如何并行化:

import parmap
# If you want to do:
y = [myfunction(x, argument1, argument2) for x in mylist]
# In parallel:
y = parmap.map(myfunction, mylist, argument1, argument2)

# If you want to do:
z = [myfunction(x, y, argument1, argument2) for (x,y) in mylist]
# In parallel:
z = parmap.starmap(myfunction, mylist, argument1, argument2)

# If you want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
        listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)

我已将parmap上传到PyPI和github repository

例如,问题可以回答如下:

import parmap

def harvester(case, text):
    X = case[0]
    text+ str(X)

if __name__ == "__main__":
    case = RAW_DATASET  # assuming this is an iterable
    parmap.map(harvester, case, "test", chunksize=1)

答案 5 :(得分:9)

有一个multiprocessing的分支叫pathos注意:使用github上的版本),不需要starmap - 地图函数镜像python的映射的API,因此map可以采用多个参数。使用pathos,您通常也可以在解释器中执行多处理,而不是卡在__main__块中。在经过一些温和的更新后,Pathos将会发布 - 主要是转换为python 3.x。

  Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> def func(a,b):
  ...     print a,b
  ...
  >>>
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>> pool.map(func, [1,2,3], [1,1,1])
  1 1
  2 1
  3 1
  [None, None, None]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]

答案 6 :(得分:8)

您可以使用以下两个函数,以避免为每个新函数编写包装器:

import itertools
from multiprocessing import Pool

def universal_worker(input_pair):
    function, args = input_pair
    return function(*args)

def pool_args(function, *args):
    return zip(itertools.repeat(function), zip(*args))

将函数function与参数列表arg_0arg_1arg_2一起使用,如下所示:

pool = Pool(n_core)
list_model = pool.map(universal_worker, pool_args(function, arg_0, arg_1, arg_2)
pool.close()
pool.join()

答案 7 :(得分:6)

python2的更好解决方案:

from multiprocessing import Pool
def func((i, (a, b))):
    print i, a, b
    return a + b
pool = Pool(3)
pool.map(func, [(0,(1,2)), (1,(2,3)), (2,(3, 4))])

2 3 4

1 2 3

0 1 2

出[]:

[3,5,7]

答案 8 :(得分:6)

更好的方法是使用装饰器而不是手动编写包装函数。特别是当您有许多要映射的函数时,装饰器将通过避免为每个函数编写包装来节省您的时间。通常,装饰函数不可选,但我们可以使用functools来绕过它。可以找到更多的嫌疑here

这里的例子是

def unpack_args(func):
    from functools import wraps
    @wraps(func)
    def wrapper(args):
        if isinstance(args, dict):
            return func(**args)
        else:
            return func(*args)
    return wrapper

@unpack_args
def func(x, y):
    return x + y

然后你可以用压缩参数映射它

np, xlist, ylist = 2, range(10), range(10)
pool = Pool(np)
res = pool.map(func, zip(xlist, ylist))
pool.close()
pool.join()

当然,您可以在Python 3中使用Pool.starmap(&gt; = 3.3),如其他答案中所述。

答案 9 :(得分:6)

另一个简单的替代方法是将函数参数包装在元组中,然后包装应该在元组中传递的参数。在处理大量数据时,这可能并不理想。我相信它会为每个元组制作副本。

writing to log

以随机顺序给出输出:

from multiprocessing import Pool

def f((a,b,c,d)):
    print a,b,c,d
    return a + b + c +d

if __name__ == '__main__':
    p = Pool(10)
    data = [(i+0,i+1,i+2,i+3) for i in xrange(10)]
    print(p.map(f, data))
    p.close()
    p.join()

答案 10 :(得分:5)

这是另一种方法,恕我直言,它比提供的任何其他答案更简单,更优雅。

该程序具有接受两个参数,将其打印出来并打印和的功能:

import multiprocessing

def main():

    with multiprocessing.Pool(10) as pool:
        params = [ (2, 2), (3, 3), (4, 4) ]
        pool.starmap(printSum, params)
    # end with

# end function

def printSum(num1, num2):
    mySum = num1 + num2
    print('num1 = ' + str(num1) + ', num2 = ' + str(num2) + ', sum = ' + str(mySum))
# end function

if __name__ == '__main__':
    main()

输出为:

num1 = 2, num2 = 2, sum = 4
num1 = 3, num2 = 3, sum = 6
num1 = 4, num2 = 4, sum = 8

有关更多信息,请参见python文档:

https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool

尤其要确保签出starmap函数。

我使用的是Python 3.6,我不确定这是否适用于较旧的Python版本

为什么我不确定文档中没有像这样的简单示例。

答案 11 :(得分:4)

#“如何获取多个参数”。

def f1(args):
    a, b, c = args[0] , args[1] , args[2]
    return a+b+c

if __name__ == "__main__":
    import multiprocessing
    pool = multiprocessing.Pool(4) 

    result1 = pool.map(f1, [ [1,2,3] ])
    print(result1)

答案 12 :(得分:3)

另一种方法是将列表列表传递给单参数例程:

import os
from multiprocessing import Pool

def task(args):
    print "PID =", os.getpid(), ", arg1 =", args[0], ", arg2 =", args[1]

pool = Pool()

pool.map(task, [
        [1,2],
        [3,4],
        [5,6],
        [7,8]
    ])

人们可以用一种最喜欢的方法构建一个参数列表。

答案 13 :(得分:2)

从python 3.4.4开始,您可以使用multiprocessing.get_context()来获取上下文对象以使用多个start方法:

import multiprocessing as mp

def foo(q, h, w):
    q.put(h + ' ' + w)
    print(h + ' ' + w)

if __name__ == '__main__':
    ctx = mp.get_context('spawn')
    q = ctx.Queue()
    p = ctx.Process(target=foo, args=(q,'hello', 'world'))
    p.start()
    print(q.get())
    p.join()

或者您只是简单地替换

pool.map(harvester(text,case),case, 1)

由:

pool.apply_async(harvester(text,case),case, 1)

答案 14 :(得分:0)

将所有参数存储为元组数组

例如,您通常将函数称为

def mainImage(fragCoord : vec2, iResolution : vec3, iTime : float) -> vec3:

改为传递一个元组并解压参数

def mainImage(package_iter) -> vec3: 
    fragCoord=package_iter[0]  
    iResolution=package_iter[1]
    iTime=package_iter[2]

事先使用循环构建元组

    package_iter = [] 
    iResolution = vec3(nx,ny,1)
    for j in range( (ny-1), -1, -1):
        for i in range( 0, nx, 1): 
            fragCoord : vec2 = vec2(i,j)
            time_elapsed_seconds = 10
            package_iter.append(  (fragCoord, iResolution, time_elapsed_seconds)  )

然后通过传递TUPLES的ARRAY来执行所有使用map

    array_rgb_values = []

    with concurrent.futures.ProcessPoolExecutor() as executor: 
        for  val in executor.map(mainImage, package_iter):          
            fragColor=val
            ir = clip( int(255* fragColor.r), 0, 255)
            ig = clip(int(255* fragColor.g), 0, 255)
            ib= clip(int(255* fragColor.b), 0, 255)

            array_rgb_values.append( (ir,ig,ib) )

我知道 Python 有 * 和 ** 用于解包,但我还没有尝试过。 与低级多处理库相比,使用更高级别的并发库也更好

答案 15 :(得分:0)

这可能是另一种选择。诀窍在wrapper函数中,该函数返回另一个传递给pool.map的函数。下面的代码读取一个输入数组,并为其中的每个(唯一)元素返回该元素在数组中出现的次数(即计数),例如,如果输入为

np.eye(3) = [ [1. 0. 0.]
              [0. 1. 0.]
              [0. 0. 1.]]

然后零出现6次,1次出现3次

import numpy as np
from multiprocessing.dummy import Pool as ThreadPool
from multiprocessing import cpu_count


def extract_counts(label_array):
    labels = np.unique(label_array)
    out = extract_counts_helper([label_array], labels)
    return out

def extract_counts_helper(args, labels):
    n = max(1, cpu_count() - 1)
    pool = ThreadPool(n)
    results = {}
    pool.map(wrapper(args, results), labels)
    pool.close()
    pool.join()
    return results

def wrapper(argsin, results):
    def inner_fun(label):
        label_array = argsin[0]
        counts = get_label_counts(label_array, label)
        results[label] = counts
    return inner_fun

def get_label_counts(label_array, label):
    return sum(label_array.flatten() == label)

if __name__ == "__main__":
    img = np.ones([2,2])
    out = extract_counts(img)
    print('input array: \n', img)
    print('label counts: ', out)
    print("========")
           
    img = np.eye(3)
    out = extract_counts(img)
    print('input array: \n', img)
    print('label counts: ', out)
    print("========")
    
    img = np.random.randint(5, size=(3, 3))
    out = extract_counts(img)
    print('input array: \n', img)
    print('label counts: ', out)
    print("========")

您应该得到:

input array: 
 [[1. 1.]
 [1. 1.]]
label counts:  {1.0: 4}
========
input array: 
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
label counts:  {0.0: 6, 1.0: 3}
========
input array: 
 [[4 4 0]
 [2 4 3]
 [2 3 1]]
label counts:  {0: 1, 1: 1, 2: 2, 3: 2, 4: 3}
========

答案 16 :(得分:0)

import time
from multiprocessing import Pool


def f1(args):
    vfirst, vsecond, vthird = args[0] , args[1] , args[2]
    print(f'First Param: {vfirst}, Second value: {vsecond} and finally third value is: {vthird}')
    pass


if __name__ == '__main__':
    p = Pool()
    result = p.map(f1, [['Dog','Cat','Mouse']])
    p.close()
    p.join()
    print(result)

答案 17 :(得分:0)

这里有很多答案,但似乎没有一个提供可在任何版本上运行的Python 2/3兼容代码。如果您想让代码正常工作,则适用于任一Python版本:

# For python 2/3 compatibility, define pool context manager
# to support the 'with' statement in Python 2
if sys.version_info[0] == 2:
    from contextlib import contextmanager
    @contextmanager
    def multiprocessing_context(*args, **kwargs):
        pool = multiprocessing.Pool(*args, **kwargs)
        yield pool
        pool.terminate()
else:
    multiprocessing_context = multiprocessing.Pool

之后,您可以随意使用常规Python 3方式进行多处理。例如:

def _function_to_run_for_each(x):
       return x.lower()
with multiprocessing_context(processes=3) as pool:
    results = pool.map(_function_to_run_for_each, ['Bob', 'Sue', 'Tim'])    print(results)

将在Python 2或Python 3中工作。

答案 18 :(得分:0)

这是我用来将多个参数传递给pool.imap分支中使用的一个参数函数的例程的示例:

public class VirtualServer {
   private String kind;
   private String selfLink;
   private List<VirtualServer.Item> items;
   ...

   public static class Item {
       private String kind;
       private String name;
       private String partition;
       private String fullPath;

       ...
   }
}

答案 19 :(得分:0)

text = "test"

def unpack(args):
    return args[0](*args[1:])

def harvester(text, case):
    X = case[0]
    text+ str(X)

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=6)
    case = RAW_DATASET
    # args is a list of tuples 
    # with the function to execute as the first item in each tuple
    args = [(harvester, text, c) for c in case]
    # doing it this way, we can pass any function
    # and we don't need to define a wrapper for each different function
    # if we need to use more than one
    pool.map(unpack, args)
    pool.close()
    pool.join()

答案 20 :(得分:0)

在官方文档中声明它只支持一个可迭代参数。我喜欢在这种情况下使用apply_async。在你的情况下,我会这样做:

from multiprocessing import Process, Pool, Manager

text = "test"
def harvester(text, case, q = None):
 X = case[0]
 res = text+ str(X)
 if q:
  q.put(res)
 return res


def block_until(q, results_queue, until_counter=0):
 i = 0
 while i < until_counter:
  results_queue.put(q.get())
  i+=1

if __name__ == '__main__':
 pool = multiprocessing.Pool(processes=6)
 case = RAW_DATASET
 m = Manager()
 q = m.Queue()
 results_queue = m.Queue() # when it completes results will reside in this queue
 blocking_process = Process(block_until, (q, results_queue, len(case)))
 blocking_process.start()
 for c in case:
  try:
   res = pool.apply_async(harvester, (text, case, q = None))
   res.get(timeout=0.1)
  except:
   pass
 blocking_process.join()

答案 21 :(得分:-1)

对于python2,你可以使用这个技巧

def fun(a,b):
    return a+b

pool = multiprocessing.Pool(processes=6)
b=233
pool.map(lambda x:fun(x,b),range(1000))