我正在编写一个程序来查看包含下载URL的新文件的特定目录。一旦检测到新文件,它将创建一个新进程以在父进程继续观看目录时进行实际下载。我正在使用Process
中的multiprocessing
界面。我遇到的问题是,除非我调用process.join(),子进程仍在运行,但process.join()是一个阻塞函数,它无法创建子进程来处理实际下载。
我的问题是,有没有办法以非阻塞的方式加入子进程,这将允许父进程继续做它的事情?
部分代码:
def main(argv):
# parse command line args
...
# set up variables
...
watch_dir(watch_dir, download_dir)
def watch_dir(wDir, dDir):
# Grab the current watch directory listing
before = dict([(f, None) for f in os.listdir (wDir)])
# Loop FOREVER
while 1:
# sleep for 10 secs
time.sleep(10)
# Grab the current dir listing
after = dict([(f, None) for f in os.listdir (wDir)])
# Get the list of new files
added = [f for f in after if not f in before]
# Get the list of deleted files
removed = [f for f in before if not f in after]
if added:
# We have new files, do your stuff
print "Added: ", ", ".join(added)
# Call the new process for downloading
p = Process(target=child, args=(added, wDir, dDir))
p.start()
p.join()
if removed:
# tell the user the file was deleted
print "Removed: ", ", ".join(removed)
# Set before to the current
before = after
def child(filename, wDir, dDir):
# Open filename and extract the url
...
# Download the file and to the dDir directory
...
# Delete filename from the watch directory
...
# exit cleanly
os._exit(0)
父母等待孩子完成执行,然后继续p.join()
之后(据我所知)正确。但这违背了创造孩子的整个目的。如果我离开p.join()
,那么孩子仍然活跃,ps ax | grep
python给我'python< defunct>'。
我希望这个孩子能够完成它的所作所为,并在没有阻止父母的情况下离开。有办法吗?
答案 0 :(得分:14)
您可以设置一个单独的线程来进行连接。让它监听你推动子进程句柄的queue:
class Joiner(Thread):
def __init__(self, q):
self.__q = q
def run(self):
while True:
child = self.__q.get()
if child == None:
return
child.join()
然后,代替p.join()
,执行joinq.put(p)
并执行joinq.put(None)
以表示线程停止。确保使用FIFO队列。
答案 1 :(得分:6)
在你的while循环中,调用
multiprocessing.active_children()
返回当前进程的所有活孩子的列表。 调用它会产生“加入”任何已经完成的进程的副作用。
答案 2 :(得分:3)
也许您应该使用不同的工具,例如apply_async()
和多处理.Pool():
multiprocessing.Process()
成功地为您工作。
def main(argv):
# parse command line args
...
# set up variables
...
# set up multiprocessing Pool
pool = multiprocessing.Pool()
try:
watch_dir(watch_dir, download_dir, pool)
# catch whatever kind of exception you expect to end your infinite loop
# you can omit this try/except if you really think your script will
# run "forever" and you're okay with zombies should it crash
except KeyboardInterrupt:
pool.close()
pool.join()
def watch_dir(wDir, dDir, pool):
# Grab the current watch directory listing
before = dict([(f, None) for f in os.listdir (wDir)])
# Loop FOREVER
while 1:
# sleep for 10 secs
time.sleep(10)
# Grab the current dir listing
after = dict([(f, None) for f in os.listdir (wDir)])
# Get the list of new files
added = [f for f in after if not f in before]
# Get the list of deleted files
removed = [f for f in before if not f in after]
if added:
# We have new files, do your stuff
print "Added: ", ", ".join(added)
# launch the function in a subprocess - this is NON-BLOCKING
pool.apply_async(child, (added, wDir, dDir))
if removed:
# tell the user the file was deleted
print "Removed: ", ", ".join(removed)
# Set before to the current
before = after
def child(filename, wDir, dDir):
# Open filename and extract the url
...
# Download the file and to the dDir directory
...
# Delete filename from the watch directory
...
# simply return to "exit cleanly"
return
multiprocessing.Pool()
是一个工作人员子流程池,您可以提交"工作"至。 pool.apply_async()
函数调用导致其中一个子进程使用提供的参数以异步方式运行函数,并且在脚本完成所有工作并关闭整个池之前不需要加入。图书馆为您管理细节。
我认为由于以下原因,这将比当前接受的答案更好地服务于您:
它消除了为管理子进程而启动额外线程和队列的不必要的复杂性
2.它使用专门为此目的而制作的库例程,这样您就可以获得未来库改进的好处。
恕我直言,它更易于维护
4.它更灵活。如果您有一天决定要实际查看子进程的返回值,则可以存储apply_async()
调用的返回值(result object)并随时检查它。您可以将一堆它们存储在列表中,并在列表超过一定大小时将它们作为批处理进行处理。您可以将池的创建移动到watch_dir()
函数中,并取消使用try / except,如果您真的不在乎如果"无限"循环中断。如果你在(当前)无限循环中放置某种中断条件,你可以在循环后简单地添加pool.close()
和pool.join()
并清理所有内容。
答案 3 :(得分:2)
如果您不关心孩子何时以及是否终止,并且您只是想避免孩子最终成为僵尸进程,那么您可以做一个双叉,以便孙子最终成为一个孩子init
。在代码中:
def child(*args):
p = Process(target=grandchild, args=args)
p.start()
os._exit(0)
def grandchild(filename, wDir, dDir):
# Open filename and extract the url
...
# Download the file and to the dDir directory
...
# Delete filename from the watch directory
...
# exit cleanly
os._exit(0)
答案 4 :(得分:1)
您还可以将multiprocessing.Process
与deamon=True
一起使用(守护进程); process.start()
方法不会阻塞,因此您的父进程可以继续工作而无需等待其子进程完成。
唯一的警告是不允许 daemonic 进程产生子进程。
from multiprocessing import Process
child_process = Process(
target=my_func,
daemon=True
)
child_process.start()
# Keep doing your stuff