在python中并行运行带有位置和可选参数的函数(跟进)

时间:2015-05-06 14:15:22

标签: python python-2.7 parallel-processing multiprocessing

这是一个跟进问题:Python: How can I run python functions in parallel?

最小工作示例:

'''
Created on 06.05.2015
https://stackoverflow.com/questions/7207309/python-how-can-i-run-python-functions-in-parallel
'''
from multiprocessing import Process
import time

def runInParallel(*fns):
    proc = []
    for fn in fns:
        p = Process(target=fn)
        p.start()
        proc.append(p)
    for p in proc:
        p.join()

def func1():
    s=time.time()
    print 'func1: starting', s 
    for i in xrange(1000000000):
        if i==i:
            pass
    e = time.time()
    print 'func1: finishing', e
    print 'duration', e-s

if __name__ == '__main__':
    s =time.time()
    runInParallel(func1, func1, func1, func1, func1)
    print time.time()-s

利用这个(它正是我想要的):

  

func1:从1430920678.09开始

     

func1:从1430920678.53开始

     

func1:从1430920679.02开始

     

func1:从1430920679.57开始

     

func1:从1430920680.55开始

     

func1:完成1430920729.68

     

持续时间51.1449999809

     

func1:完成1430920729.78

     

持续时间51.6889998913

     

func1:完成1430920730.69

     

持续时间51.1239998341

     

func1:完成1430920748.64

     

持续时间69.6180000305

     

func1:完成1430920749.25

     

持续时间68.7009999752

     

71.5629999638

但是,我的函数有很多参数,所以我测试了它:

- > func1(a)现在传递了一个参数。

'''
Created on 06.05.2015
https://stackoverflow.com/questions/7207309/python-how-can-i-run-python-functions-in-parallel
'''
from multiprocessing import Process
import time

def runInParallel(*fns):
    proc = []
    for fn in fns:
        p = Process(target=fn)
        p.start()
        proc.append(p)
    for p in proc:
        p.join()

def func1(a):
    s=time.time()
    print 'func1: starting', s 
    for i in xrange(a):
        if i==i:
            pass
    e = time.time()
    print 'func1: finishing', e
    print 'duration', e-s

if __name__ == '__main__':
    s =time.time()
    g=s
    runInParallel(func1(1000000000), func1(1000000000),
                  func1(1000000000), func1(1000000000),
                  func1(1000000000))
    print time.time()-s

现在发生这种情况:

  

func1:从1430921299.08开始

     

func1:完成1430921327.84

     

持续时间28.760999918

     

func1:开始1430921327.84

     

func1:完成1430921357.68

     

持续时间29.8410000801

     

func1:开始1430921357.68

     

func1:完成1430921387.14

     

持续时间29.4619998932

     

func1:从1430921387.14开始

     

func1:完成1430921416.52

     

持续时间29.3849999905

     

func1:开始1430921416.52

     

func1:完成1430921447.39

     

持续时间30.864000082

     

151.392999887

这个过程现在顺序而且不再平行,我不知道为什么!我错过了什么,做错了什么?

编辑:此外,一个例子怎么样,只有一些参数是位置的而其他的是可选的?

3 个答案:

答案 0 :(得分:5)

您必须使用参数"The data reader is incompatible with the specified 'ReservingModel.t_MyTable'. A member of the type, 'MyTableID', does not have a corresponding column in the data reader with the same name." 将参数传递给Process。例如:

args

然后使用:

调用该函数
def runInParallel(*fns):
    proc = []
    for fn, arg in fns:
        p = Process(target=fn, args=(arg,))
        p.start()
        proc.append(p)
    for p in proc:
        p.join()

此外,您可以考虑使用Pool代替:

runInParallel((func1, 10**9),
              (func1, 10**9),
              (func1, 10**9))

编辑:

from multiprocessing import Pool pool = Pool() pool.apply_async(func1, (10**9,)) pool.apply_async(func1, (10**9,)) pool.apply_async(func1, (10**9,)) Process的工作方式相同。他们采用两个可选参数Pool.apply_asynchargs。这些是python中位置参数和关键字参数的标准变量:

kwargs

f(1, 2, a=3, b=4) # is equivalent to args, kwargs = (1, 2), {"a":3, "b":4} f(*args, **kwargs) 相同的例子:

multiprocessing

答案 1 :(得分:2)

如果您不介意使用multiprocessing的分支,那么您可以使用多个参数为并行map的目标做一些非常酷的事情。在这里,我构建了一个需要2个参数的函数,但也有一个可选参数,并且需要*args**kwds。我将构建一个具有随机长度的输入列表,并将它们并行运行。

>>> from pathos.multiprocessing import ProcessingPool as PPool
>>> pmap = PPool().map
>>> from pathos.multiprocessing import ThreadingPool as TPool
>>> tmap = TPool().map
>>> import numpy
>>>
>>> # build a function with multiple arguments, some optional
>>> def do_it(x,y,z=1,*args,**kwds):
...   import time
...   import random
...   s = time.time()
...   print 'starting', s
...   time.sleep(random.random())
...   res = sum([x,y,z]+list(args)+kwds.values())
...   e = time.time()
...   print 'finishing', e
...   print 'duration', e-s
...   return res
... 
>>> # create a bunch of random-length arrays as input for do_it
>>> input = map(numpy.random.random, tmap(numpy.random.randint, [2]*5, [6]*5))
>>> input
[array([ 0.25178071,  0.68871176,  0.92305523,  0.47103722]), array([ 0.14214278,  0.16747431,  0.59177496,  0.79984192]), array([ 0.20061353,  0.94339813,  0.67396539,  0.99919187]), array([ 0.63974882,  0.46868301,  0.59963679,  0.97704561]), array([ 0.14515633,  0.97824495,  0.57832663,  0.34167116])] 

现在,让我们得到我们的结果......

>>> # call do_it in parallel, with random-length inputs
>>> result = pmap(do_it, *input)
starting 1431039902.85
starting 1431039902.85
starting 1431039902.85
starting 1431039902.85
finishing 1431039903.21
finishing 1431039903.21
duration 0.358909130096
duration 0.35973405838
finishing 1431039903.21
finishing 1431039903.21
duration 0.359538078308
duration 0.358761072159
>>> result
[1.379442164896775, 3.2465121635066176, 3.3667590048477187, 3.5887877829029042]

当然,如果你想变得棘手,你可以在一行中运行三重嵌套地图。

>>> # do it, all in one line
>>> result = pmap(do_it, *map(numpy.random.random, tmap(numpy.random.randint, [2]*5, [6]*5)))
starting 1431040673.62
starting 1431040673.62
starting 1431040673.62
starting 1431040673.62
starting 1431040673.62
finishing 1431040673.73
finishing 1431040673.73
duration 0.110394001007
duration 0.111043930054
finishing 1431040673.73
duration 0.110962152481
finishing 1431040673.73
duration 0.110266923904
finishing 1431040673.74
duration 0.110939025879
>>> result
[1.9904591398425764, 1.932317817954369, 2.6365732054048432, 2.5168248011900047, 2.0410734229587968]

并且,你根本不可能使用阻塞或序列map,事情会非常快(我在这里忽略了numpy随机种子)。

>>> # get a non-blocking thread map and an asynchronous processing map
>>> itmap = TPool().imap
>>> apmap = Pool().amap
>>>
>>> # do it!
>>> result = apmap(do_it, *itmap(numpy.random.random, itmap(numpy.random.randint, [2]*5, [6]*5)))
starting 1431041250.33
starting 1431041250.33
starting 1431041250.33
finishing 1431041250.44
duration 0.110985040665
finishing 1431041250.44
duration 0.110254049301
finishing 1431041250.45
duration 0.110941886902
>>> result.get()
[3.6386644432719697, 0.43038222983159957, 3.6220901279963318]

在此处获取pathoshttps://github.com/uqfoundation

答案 2 :(得分:1)

问题

我认为你的问题来自你在第一个例子中提供函数处理程序并在第二个例子中直接评估函数的事实。

func1

不等于

func1 ()

解决方案

根据s://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process,你必须分开给你的论点

p = Process(target=fn, args=(10000000,))

希望这有帮助