Python在不阻塞父级的情况下加入进程

时间:2011-03-06 13:50:28

标签: python multiprocessing

我正在编写一个程序来查看包含下载URL的新文件的特定目录。一旦检测到新文件,它将创建一个新进程以在父进程继续观看目录时进行实际下载。我正在使用Process中的multiprocessing界面。我遇到的问题是,除非我调用process.join(),子进程仍在运行,但process.join()是一个阻塞函数,它无法创建子进程来处理实际下载。

我的问题是,有没有办法以非阻塞的方式加入子进程,这将允许父进程继续做它的事情?

部分代码:

def main(argv):
  # parse command line args
  ...
  # set up variables
  ...
  watch_dir(watch_dir, download_dir)


def watch_dir(wDir, dDir):
  # Grab the current watch directory listing
  before = dict([(f, None) for f in os.listdir (wDir)])

  # Loop FOREVER
  while 1:
    # sleep for 10 secs
    time.sleep(10)

    # Grab the current dir listing
    after = dict([(f, None) for f in os.listdir (wDir)])

    # Get the list of new files
    added = [f for f in after if not f in before]
    # Get the list of deleted files
    removed = [f for f in before if not f in after]

    if added:
      # We have new files, do your stuff
      print "Added: ", ", ".join(added)

      # Call the new process for downloading
      p = Process(target=child, args=(added, wDir, dDir))
      p.start()
      p.join()

    if removed:
      # tell the user the file was deleted
      print "Removed: ", ", ".join(removed)

    # Set before to the current
    before = after

def child(filename, wDir, dDir):
  # Open filename and extract the url
  ...
  # Download the file and to the dDir directory
  ...
  # Delete filename from the watch directory
  ...
  # exit cleanly
  os._exit(0)

父母等待孩子完成执行,然后继续p.join()之后(据我所知)正确。但这违背了创造孩子的整个目的。如果我离开p.join(),那么孩子仍然活跃,ps ax | grep python给我'python< defunct>'。

我希望这个孩子能够完成它的所作所为,并在没有阻止父母的情况下离开。有办法吗?

5 个答案:

答案 0 :(得分:14)

您可以设置一个单独的线程来进行连接。让它监听你推动子进程句柄的queue

class Joiner(Thread):
    def __init__(self, q):
        self.__q = q
    def run(self):
        while True:
            child = self.__q.get()
            if child == None:
                return
            child.join()

然后,代替p.join(),执行joinq.put(p)并执行joinq.put(None)以表示线程停止。确保使用FIFO队列。

答案 1 :(得分:6)

在你的while循环中,调用

multiprocessing.active_children()

返回当前进程的所有活孩子的列表。 调用它会产生“加入”任何已经完成的进程的副作用。

答案 2 :(得分:3)

也许您应该使用不同的工具,例如apply_async()和多处理.Pool():

,而不是试图让multiprocessing.Process()成功地为您工作。
def main(argv):
    # parse command line args
    ...
    # set up variables
    ...

    # set up multiprocessing Pool
    pool = multiprocessing.Pool()

    try:
        watch_dir(watch_dir, download_dir, pool)

    # catch whatever kind of exception you expect to end your infinite loop
    # you can omit this try/except if you really think your script will 
    # run "forever" and you're okay with zombies should it crash
    except KeyboardInterrupt:
        pool.close()
        pool.join()

def watch_dir(wDir, dDir, pool):
    # Grab the current watch directory listing
    before = dict([(f, None) for f in os.listdir (wDir)])

    # Loop FOREVER
    while 1:
        # sleep for 10 secs
        time.sleep(10)

        # Grab the current dir listing
        after = dict([(f, None) for f in os.listdir (wDir)])

        # Get the list of new files
        added = [f for f in after if not f in before]
        # Get the list of deleted files
        removed = [f for f in before if not f in after]

        if added:
            # We have new files, do your stuff
            print "Added: ", ", ".join(added)

            # launch the function in a subprocess - this is NON-BLOCKING
            pool.apply_async(child, (added, wDir, dDir))

        if removed:
            # tell the user the file was deleted
            print "Removed: ", ", ".join(removed)

        # Set before to the current
        before = after

def child(filename, wDir, dDir):
    # Open filename and extract the url
    ...
    # Download the file and to the dDir directory
    ...
    # Delete filename from the watch directory
    ...
    # simply return to "exit cleanly"
    return

multiprocessing.Pool()是一个工作人员子流程池,您可以提交"工作"至。 pool.apply_async()函数调用导致其中一个子进程使用提供的参数以异步方式运行函数,并且在脚本完成所有工作并关闭整个池之前不需要加入。图书馆为您管理细节。

我认为由于以下原因,这将比当前接受的答案更好地服务于您:
它消除了为管理子进程而启动额外线程和队列的不必要的复杂性 2.它使用专门为此目的而制作的库例程,这样您就可以获得未来库改进的好处。
恕我直言,它更易于维护 4.它更灵活。如果您有一天决定要实际查看子进程的返回值,则可以存储apply_async()调用的返回值(result object)并随时检查它。您可以将一堆它们存储在列表中,并在列表超过一定大小时将它们作为批处理进行处理。您可以将池的创建移动到watch_dir()函数中,并取消使用try / except,如果您真的不在乎如果"无限"循环中断。如果你在(当前)无限循环中放置某种中断条件,你可以在循环后简单地添加pool.close()pool.join()并清理所有内容。

答案 3 :(得分:2)

如果您不关心孩子何时以及是否终止,并且您只是想避免孩子最终成为僵尸进程,那么您可以做一个双叉,以便孙子最终成为一个孩子init。在代码中:

def child(*args):
  p = Process(target=grandchild, args=args)
  p.start()
  os._exit(0)

def grandchild(filename, wDir, dDir):
  # Open filename and extract the url
  ...
  # Download the file and to the dDir directory
  ...
  # Delete filename from the watch directory
  ...
  # exit cleanly
  os._exit(0)

答案 4 :(得分:1)

您还可以将multiprocessing.Processdeamon=True一起使用(守护进程); process.start()方法不会阻塞,因此您的父进程可以继续工作而无需等待其子进程完成。

唯一的警告是不允许 daemonic 进程产生子进程。

from multiprocessing import Process

child_process = Process(
    target=my_func,
    daemon=True
)
child_process.start()
# Keep doing your stuff