My question concerns a replacement of join() function to avoid a defunct or zombie state of already terminated processes when using the multiprocessing library of python3. Is there an alternative which may suspend the child processes from being terminated until they get the green light from the main process? This allows them to terminate correctly without going into a zombie state?
I prepared a quick illustration using the following code which launches 20 different processes, the first process takes 10 seconds work of load and all others take 3 seconds work of load:
import os
import sys
import time
import multiprocessing as mp
from multiprocessing import Process
def exe(i):
print(i)
if i == 1:
time.sleep(10)
else:
time.sleep(3)
procs = []
for i in range(1,20):
proc = Process(target=exe, args=(i,))
proc.start()
procs.append(proc)
for proc in procs:
print(proc) # <-- I'm blocked to join others till the first process finishes its work load
proc.join()
print("finished")
If you launch the script, you will see that all the other processes go to into a zombie state until the join() function is released from the first process. This could make the system unstable or overloaded!
Thanks
答案 0 :(得分:1)
对于this thread,Marko Rauhamaa写道:
如果您不希望知道子进程何时退出,则可以简单地忽略SIGCHLD信号:
import signal signal.signal(signal.SIGCHLD, signal.SIG_IGN)
这将防止僵尸出现。
POSIX.1-2001指定是否将SIGCHLD的处置设置为 为SIGCHLD设置了SIG_IGN或SA_NOCLDWAIT标志(请参见 sigaction(2)),则终止的子代不会成为僵尸, 对wait()或waitpid()的调用将阻塞,直到所有子级都 终止,然后由于errno设置为ECHILD而失败。 (原本的 POSIX标准保留了将SIGCHLD设置为SIG_IGN的行为 未指定。请注意,即使 SIGCHLD为“忽略”,将处置方式显式设置为SIG_IGN 导致对僵尸进程儿童的不同对待。)
Linux 2.6符合POSIX要求。但是,Linux 2.4 (及更早版本)不会:如果在以下情况下调用了wait()或waitpid() SIGCHLD被忽略,调用的行为就像SIGCHLD 不会被忽略,也就是说,呼叫阻塞直到下一个孩子 终止,然后返回该子进程的进程ID和状态。
因此,如果您使用的是Linux 2.6或POSIX兼容操作系统,则使用上面的代码将使子进程退出而不会成为僵尸。如果您未使用兼容POSIX的操作系统,则上面的线程提供了许多选项。以下是另一种选择,与Marko Rauhamaa的third suggestion类似。
如果由于某些原因您需要知道子进程何时退出并且希望 以不同的方式处理(至少其中一些),那么您可以设置一个队列来 允许子进程在完成后向主进程发出信号。然后 主进程可以按接收顺序调用适当的联接 队列中的项目:
import time
import multiprocessing as mp
def exe(i, q):
try:
print(i)
if i == 1:
time.sleep(10)
elif i == 10:
raise Exception('I quit')
else:
time.sleep(3)
finally:
q.put(mp.current_process().name)
if __name__ == '__main__':
procs = dict()
q = mp.Queue()
for i in range(1,20):
proc = mp.Process(target=exe, args=(i, q))
proc.start()
procs[proc.name] = proc
while procs:
name = q.get()
proc = procs[name]
print(proc)
proc.join()
del procs[name]
print("finished")
产生类似
的结果...
<Process(Process-10, stopped[1])> # <-- process with exception still gets joined
19
<Process(Process-2, started)>
<Process(Process-4, stopped)>
<Process(Process-6, started)>
<Process(Process-5, stopped)>
<Process(Process-3, stopped)>
<Process(Process-9, started)>
<Process(Process-7, stopped)>
<Process(Process-8, started)>
<Process(Process-13, started)>
<Process(Process-12, stopped)>
<Process(Process-11, stopped)>
<Process(Process-16, started)>
<Process(Process-15, stopped)>
<Process(Process-17, stopped)>
<Process(Process-14, stopped)>
<Process(Process-18, started)>
<Process(Process-19, stopped)>
<Process(Process-1, started)> # <-- Process-1 ends last
finished