Question

我正在编写一个程序来查看包含下载URL的新文件的特定目录。一旦检测到新文件，它将创建一个新进程以在父进程继续观看目录时进行实际下载。我正在使用Process中的multiprocessing界面。我遇到的问题是，除非我调用process.join（），子进程仍在运行，但process.join（）是一个阻塞函数，它无法创建子进程来处理实际下载。

我的问题是，有没有办法以非阻塞的方式加入子进程，这将允许父进程继续做它的事情？

部分代码：

def main(argv):
  # parse command line args
  ...
  # set up variables
  ...
  watch_dir(watch_dir, download_dir)


def watch_dir(wDir, dDir):
  # Grab the current watch directory listing
  before = dict([(f, None) for f in os.listdir (wDir)])

  # Loop FOREVER
  while 1:
    # sleep for 10 secs
    time.sleep(10)

    # Grab the current dir listing
    after = dict([(f, None) for f in os.listdir (wDir)])

    # Get the list of new files
    added = [f for f in after if not f in before]
    # Get the list of deleted files
    removed = [f for f in before if not f in after]

    if added:
      # We have new files, do your stuff
      print "Added: ", ", ".join(added)

      # Call the new process for downloading
      p = Process(target=child, args=(added, wDir, dDir))
      p.start()
      p.join()

    if removed:
      # tell the user the file was deleted
      print "Removed: ", ", ".join(removed)

    # Set before to the current
    before = after

def child(filename, wDir, dDir):
  # Open filename and extract the url
  ...
  # Download the file and to the dDir directory
  ...
  # Delete filename from the watch directory
  ...
  # exit cleanly
  os._exit(0)

父母等待孩子完成执行，然后继续p.join()之后（据我所知）正确。但这违背了创造孩子的整个目的。如果我离开p.join()，那么孩子仍然活跃，ps ax | grep python给我'python＆lt; defunct＆gt;'。

我希望这个孩子能够完成它的所作所为，并在没有阻止父母的情况下离开。有办法吗？

Answer 1

您可以设置一个单独的线程来进行连接。让它监听你推动子进程句柄的queue：

class Joiner(Thread):
    def __init__(self, q):
        self.__q = q
    def run(self):
        while True:
            child = self.__q.get()
            if child == None:
                return
            child.join()

然后，代替p.join()，执行joinq.put(p)并执行joinq.put(None)以表示线程停止。确保使用FIFO队列。

Answer 2

在你的while循环中，调用

multiprocessing.active_children()

返回当前进程的所有活孩子的列表。调用它会产生“加入”任何已经完成的进程的副作用。

Answer 3

也许您应该使用不同的工具，例如apply_async()和多处理.Pool（）：

，而不是试图让multiprocessing.Process()成功地为您工作。

def main(argv):
    # parse command line args
    ...
    # set up variables
    ...

    # set up multiprocessing Pool
    pool = multiprocessing.Pool()

    try:
        watch_dir(watch_dir, download_dir, pool)

    # catch whatever kind of exception you expect to end your infinite loop
    # you can omit this try/except if you really think your script will 
    # run "forever" and you're okay with zombies should it crash
    except KeyboardInterrupt:
        pool.close()
        pool.join()

def watch_dir(wDir, dDir, pool):
    # Grab the current watch directory listing
    before = dict([(f, None) for f in os.listdir (wDir)])

    # Loop FOREVER
    while 1:
        # sleep for 10 secs
        time.sleep(10)

        # Grab the current dir listing
        after = dict([(f, None) for f in os.listdir (wDir)])

        # Get the list of new files
        added = [f for f in after if not f in before]
        # Get the list of deleted files
        removed = [f for f in before if not f in after]

        if added:
            # We have new files, do your stuff
            print "Added: ", ", ".join(added)

            # launch the function in a subprocess - this is NON-BLOCKING
            pool.apply_async(child, (added, wDir, dDir))

        if removed:
            # tell the user the file was deleted
            print "Removed: ", ", ".join(removed)

        # Set before to the current
        before = after

def child(filename, wDir, dDir):
    # Open filename and extract the url
    ...
    # Download the file and to the dDir directory
    ...
    # Delete filename from the watch directory
    ...
    # simply return to "exit cleanly"
    return

multiprocessing.Pool()是一个工作人员子流程池，您可以提交＆＃34;工作＆＃34;至。 pool.apply_async()函数调用导致其中一个子进程使用提供的参数以异步方式运行函数，并且在脚本完成所有工作并关闭整个池之前不需要加入。图书馆为您管理细节。

我认为由于以下原因，这将比当前接受的答案更好地服务于您：
它消除了为管理子进程而启动额外线程和队列的不必要的复杂性 2.它使用专门为此目的而制作的库例程，这样您就可以获得未来库改进的好处。
恕我直言，它更易于维护 4.它更灵活。如果您有一天决定要实际查看子进程的返回值，则可以存储apply_async()调用的返回值（result object）并随时检查它。您可以将一堆它们存储在列表中，并在列表超过一定大小时将它们作为批处理进行处理。您可以将池的创建移动到watch_dir()函数中，并取消使用try / except，如果您真的不在乎如果＆＃34;无限＆＃34;循环中断。如果你在（当前）无限循环中放置某种中断条件，你可以在循环后简单地添加pool.close()和pool.join()并清理所有内容。

Answer 4

如果您不关心孩子何时以及是否终止，并且您只是想避免孩子最终成为僵尸进程，那么您可以做一个双叉，以便孙子最终成为一个孩子init。在代码中：

def child(*args):
  p = Process(target=grandchild, args=args)
  p.start()
  os._exit(0)

def grandchild(filename, wDir, dDir):
  # Open filename and extract the url
  ...
  # Download the file and to the dDir directory
  ...
  # Delete filename from the watch directory
  ...
  # exit cleanly
  os._exit(0)

Answer 5

您还可以将multiprocessing.Process与deamon=True一起使用（守护进程）； process.start()方法不会阻塞，因此您的父进程可以继续工作而无需等待其子进程完成。

唯一的警告是不允许 daemonic 进程产生子进程。

from multiprocessing import Process

child_process = Process(
    target=my_func,
    daemon=True
)
child_process.start()
# Keep doing your stuff

Python在不阻塞父级的情况下加入进程

5 个答案: