Python多处理进程号

时间:2014-11-01 22:12:36

标签: python multiprocessing

我使用Python多处理池模块来创建进程池并为其分配作业。

我创建了4个流程并分配了2个工作但是试图显示他们的流程编号但是在显示屏中我只看到一个流程编号" 6952" ...不应该打印2个流程编号

from multiprocessing import Pool
from time import sleep

def f(x):
    import os 
    print "process id = " , os.getpid()
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes

    result  =  pool.map_async(f, (11,))   #Start job 1 
    result1 =  pool.map_async(f, (10,))   #Start job 2
    print "result = ", result.get(timeout=1)  
    print "result1 = ", result1.get(timeout=1)

结果: -

result = process id =  6952
process id =  6952
 [121]
result1 =  [100]

2 个答案:

答案 0 :(得分:2)

这只是时间问题。 Windows需要在Pool中生成4个进程,然后需要启动,初始化并准备从Queue使用。在Windows上,这需要每个子进程重新导入__main__模块,并且Queue内部使用的Pool实例在每个子进程中进行unpickled。这需要花费很多时间。事实上,当map_async()中的Pool调用在Pool中的所有进程均已启动并运行之前执行时,您已经足够长。如果您在while maxtasks is None or (maxtasks and completed < maxtasks): try: print("getting {}".format(current_process())) task = get() # This is getting the task from the parent process print("got {}".format(current_process()))

中添加一些跟踪每个工作人员运行的函数,您可以看到这一点
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
process id =  5145
getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>
result =  [121]
result1 =  [100]
getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)>
getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)>
got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)>

输出:

Worker-1

正如您所看到的,Queue启动并在工作人员2-4试图从sleep消费之前消耗这两项任务。如果在主进程中实例化Pool之后但在调用map_async之前添加getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> # <sleeping here> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> process id = 5183 got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> process id = 5184 getting <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> getting <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> result = [121] result1 = [100] got <ForkServerProcess(ForkServerPoolWorker-3, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-4, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-1, started daemon)> got <ForkServerProcess(ForkServerPoolWorker-2, started daemon)> 调用,则会看到不同的进程处理每个请求:

"getting

(请注意,您看到的额外"got" / 'spawn'语句是发送到每个进程的标记,以便正常关闭它们。)

在Linux上使用Python 3.x,我可以使用'forkserver''fork'上下文重现此行为,但不能使用__main__。大概是因为分娩子进程比产生它们并重新导入{{1}}要快得多。

答案 1 :(得分:0)

它会打印2个进程ID。

result = process id =  6952  <=== process id = 6952
process id =  6952  <=== process id = 6952
 [121]
result1 =  [100]

这是因为您的工作流程快速完成并准备处理另一个请求。

result  =  pool.map_async(f, (11,))   #Start job 1 
result1 =  pool.map_async(f, (10,))   #Start job 2

在上面的代码中,您的工作人员完成了工作并返回到池中并准备完成工作2.这可能由于多种原因而发生。最常见的是工人很忙,或者还没准备好。

这是一个例子,我们将有4名工人,但其中只有一名工作人员将立即做好准备。因此,我们知道哪一个会做这项工作。

# https://gist.github.com/dnozay/b2462798ca89fbbf0bf4

from multiprocessing import Pool,Queue
from time import sleep

def f(x):
    import os 
    print "process id = " , os.getpid()
    return x*x

# Queue that will hold amount of time to sleep
# for each worker in the initialization
sleeptimes = Queue()
for times in [2,3,0,2]:
    sleeptimes.put(times)

# each worker will do the following init.
# before they are handed any task.
# in our case the 3rd worker won't sleep
# and get all the work.
def slowstart(q):
    import os
    num = q.get()
    print "slowstart: process id = {0} (sleep({1}))".format(os.getpid(),num)
    sleep(num)

if __name__ == '__main__':
    pool = Pool(processes=4,initializer=slowstart,initargs=(sleeptimes,))    # start 4 worker processes
    result  =  pool.map_async(f, (11,))   #Start job 1 
    result1 =  pool.map_async(f, (10,))   #Start job 2
    print "result = ", result.get(timeout=3)
    print "result1 = ", result1.get(timeout=3)

示例:

$ python main.py 
slowstart: process id = 97687 (sleep(2))
slowstart: process id = 97688 (sleep(3))
slowstart: process id = 97689 (sleep(0))
slowstart: process id = 97690 (sleep(2))
process id =  97689
process id =  97689
result =  [121]
result1 =  [100]