什么时候在进程上调用.join()?

时间:2013-01-20 21:45:35

标签: python multiprocessing

我正在阅读有关Python中多处理模块的各种教程,并且无法理解为何/何时调用process.join()。例如,我偶然发现了这个例子:

nums = range(100000)
nprocs = 4

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outdict = {}
    for n in nums:
        outdict[n] = factorize_naive(n)
    out_q.put(outdict)

# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []

for i in range(nprocs):
    p = multiprocessing.Process(
            target=worker,
            args=(nums[chunksize * i:chunksize * (i + 1)],
                  out_q))
    procs.append(p)
    p.start()

# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
    resultdict.update(out_q.get())

# Wait for all worker processes to finish
for p in procs:
    p.join()

print resultdict

根据我的理解,process.join()将阻止调用进程,直到调用了join方法的进程已完成执行。我也相信在上面的代码示例中启动的子进程在完成目标函数后完成执行,也就是说,在将结果推送到out_q之后。最后,我相信out_q.get()阻止调用过程,直到有结果被拉出。因此,如果您考虑代码:

resultdict = {}
for i in range(nprocs):
    resultdict.update(out_q.get())

# Wait for all worker processes to finish
for p in procs:
    p.join()

主要进程被out_q.get()调用阻止,直到每个单个工作进程完成将其结果推送到队列。因此,当主进程退出for循环时,每个子进程应该已经完成​​执行,对吗?

如果是这种情况,是否有任何理由在此时调用p.join()方法?并非所有工作进程都已完成,那么这是如何导致主进程“等待所有工作进程完成?”我问主要是因为我在多个不同的例子中看到了这一点,我很好奇我是否理解不了解。

3 个答案:

答案 0 :(得分:18)

在您致电join之前,所有工作人员都将结果放入队列,但他们不一定返回,他们的流程可能尚未终止。他们可能会也可能不会这样做,具体取决于时间。

调用join可确保所有进程都有时间正确终止。

答案 1 :(得分:18)

尝试运行:

import math
import time
from multiprocessing import Queue
import multiprocessing

def factorize_naive(n):
    factors = []
    for div in range(2, int(n**.5)+1):
        while not n % div:
            factors.append(div)
            n //= div
    if n != 1:
        factors.append(n)
    return factors

nums = range(100000)
nprocs = 4

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outdict = {}
    for n in nums:
        outdict[n] = factorize_naive(n)
    out_q.put(outdict)

# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []

for i in range(nprocs):
    p = multiprocessing.Process(
            target=worker,
            args=(nums[chunksize * i:chunksize * (i + 1)],
                  out_q))
    procs.append(p)
    p.start()

# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
    resultdict.update(out_q.get())

time.sleep(5)

# Wait for all worker processes to finish
for p in procs:
    p.join()

print resultdict

time.sleep(15)

然后打开任务管理器。您应该能够看到4个子进程在被OS终止之前进入僵尸状态几秒钟(由于连接调用):

enter image description here

在更复杂的情况下,子进程可以永远处于僵尸状态(就像你在另一个question中询问的情况一样),如果你创建了足够的子进程,你可以填充进程表导致麻烦操作系统(可能会杀死主进程以避免失败)。

答案 2 :(得分:1)

我不确定确切的实现细节,但似乎也有必要使用join来反映进程确实已终止(例如,在调用终止后)。 In the example here,如果您在终止进程后未调用join,则process.is_alive()会返回True,即使该进程是通过process.terminate()调用终止的。