具有有序打印的Python多处理子流程?

时间:2017-06-11 20:50:27

标签: python multiprocessing stdout

我试图并行运行一些Python函数,它在整个函数中都有打印命令。我想要的是让每个子进程运行相同的功能,以分组的方式输出到主stdout。我的意思是我希望每个子进程的输出只有在完成任务后才能打印出来。但是,如果在此过程中发生某种错误,我仍希望输出在子进程中完成的任何操作。

一个小例子:

from time import sleep
import multiprocessing as mp


def foo(x):
    print('foo')
    for i in range(5):
        print('Process {}: in foo {}'.format(x, i))
        sleep(0.5)


if __name__ == '__main__':
    pool = mp.Pool()

    jobs = []
    for i in range(4):
        job = pool.apply_async(foo, args=[i])
        jobs.append(job)

    for job in jobs:
        job.wait()

这是并行运行,但输出的是:

foo
Process 0: in foo 0
foo
Process 1: in foo 0
foo
Process 2: in foo 0
foo
Process 3: in foo 0
Process 1: in foo 1
Process 0: in foo 1
Process 2: in foo 1
Process 3: in foo 1
Process 1: in foo 2
Process 0: in foo 2
Process 2: in foo 2
Process 3: in foo 2
Process 1: in foo 3
Process 0: in foo 3
Process 3: in foo 3
Process 2: in foo 3
Process 1: in foo 4
Process 0: in foo 4
Process 3: in foo 4
Process 2: in foo 4

我想要的是:

foo
Process 3: in foo 0
Process 3: in foo 1
Process 3: in foo 2
Process 3: in foo 3
Process 3: in foo 4
foo
Process 1: in foo 0
Process 1: in foo 1
Process 1: in foo 2
Process 1: in foo 3
Process 1: in foo 4
foo
Process 0: in foo 0
Process 0: in foo 1
Process 0: in foo 2
Process 0: in foo 3
Process 0: in foo 4
foo
Process 2: in foo 0
Process 2: in foo 1
Process 2: in foo 2
Process 2: in foo 3
Process 2: in foo 4

任何一个进程的特定顺序都没有关系,只要每个输出都为每个子进程组合在一起。有趣的是,如果我做了

,我会得到我想要的输出
python test.py > output

我知道每个子进程都没有自己的stdout,而是使用main stdout。我已经想过并查找了一些解决方案,例如使它们使用Queue,每个子进程都有自己的stdout,然后当它完成时,我们覆盖flush命令,这样我们可以将输出输出回Queue。之后,我们可以阅读内容。但是,虽然这确实满足了我想要的,但如果函数中途停止,我无法检索输出。只有在成功完成后才会输出。从这里得到Access standard output of a sub process in python

我也看到了锁的使用,它起作用,但它完全杀死并行运行函数,因为它必须等待每个子进程执行函数foo。

另外,如果可能的话,我想避免更改我的foo函数的实现,因为我有许多函数需要更改。

编辑:我已经查看了库dispy和并行python。 Dispy正是我想要的,它有一个单独的stdout / stderr,我可以在最后打印出来,但dispy的问题是我必须在一个单独的终端中手动运行服务器。我希望能够一次性运行我的python程序而无需先打开另一个脚本。另一方面,并​​行Python也做了我想要的,但它似乎缺乏对它的控制,以及它的一些烦人的麻烦。特别是,当您打印输出时,它还会打印出函数的返回类型,我只想要使用print打印输出。此外,在运行函数时,你必须给它一个它使用的模块列表,这有点令人讨厌,因为我不想为了运行一个简单的函数而有一个大的导入列表。

1 个答案:

答案 0 :(得分:4)

正如您所注意到的,在这种情况下使用锁会导致多处理,因为您实际上已经让所有进程等待从当前拥有STDOUT“权限”的进程发布互斥锁。但是,并行运行并与您的函数/子进程同步打印在逻辑上是独占的。

您可以做的是让您的主进程充当子进程的“打印机”,这样一旦您的子进程完成/错误,那么它只会向您的主进程发回要打印的内容。您似乎非常满意打印不是“实时”(无论如何,如前所述),这样的方法应该为您提供正确的服务。所以:

import multiprocessing as mp
import random  # just to add some randomness
from time import sleep

def foo(x):
    output = ["[Process {}]: foo:".format(x)]
    for i in range(5):
        output.append('[Process {}] in foo {}'.format(x, i))
        sleep(0.2 + 1 * random.random())
    return "\n".join(output)

if __name__ == '__main__':
    pool = mp.Pool(4)
    for res in pool.imap_unordered(foo, range(4)):
        print("[MAIN]: Process finished, response:")
        print(res)  # this will print as soon as one of the processes finishes/errors
    pool.close()

哪个会给你(YMMV,当然):

[MAIN]: Process finished, response:
[Process 2]: foo:
[Process 2] in foo 0
[Process 2] in foo 1
[Process 2] in foo 2
[Process 2] in foo 3
[Process 2] in foo 4
[MAIN]: Process finished, response:
[Process 0]: foo:
[Process 0] in foo 0
[Process 0] in foo 1
[Process 0] in foo 2
[Process 0] in foo 3
[Process 0] in foo 4
[MAIN]: Process finished, response:
[Process 1]: foo:
[Process 1] in foo 0
[Process 1] in foo 1
[Process 1] in foo 2
[Process 1] in foo 3
[Process 1] in foo 4
[MAIN]: Process finished, response:
[Process 3]: foo:
[Process 3] in foo 0
[Process 3] in foo 1
[Process 3] in foo 2
[Process 3] in foo 3
[Process 3] in foo 4

你可以用同样的方式观察其他任何事情,包括错误。

UPDATE - 如果你必须使用你无法控制其输出的函数,你可以包装你的子进程并捕获它们的STDOUT / STDERR,然后一旦它们完成(或引发异常)您可以将所有内容返回给流程“经理”,以便打印到实际的STDOUT。通过这样的设置,我们可以foo()喜欢:

def foo(x):
    print("[Process {}]: foo:".format(x))
    for i in range(5):
        print('[Process {}] in foo {}'.format(x, i))
        sleep(0.2 + 1 * random.random())
        if random.random() < 0.0625:  # let's add a 1/4 chance to err:
            raise Exception("[Process {}] A random exception is random!".format(x))
    return random.random() * 100  # just a random response, you can omit it

请注意,它幸福地没有意识到某些东西试图弄乱其操作模式。然后我们将创建一个外部通用包装器(因此您不必根据函数将其更改)实际上 mess 具有其默认行为(而不仅仅是这个函数,还有其他所有内容)它可能在跑步时打电话):

def std_wrapper(args):
    try:
        from StringIO import StringIO  # ... for Python 2.x compatibility
    except ImportError:
        from io import StringIO
    import sys
    sys.stdout, sys.stderr = StringIO(), StringIO()  # replace stdout/err with our buffers
    # args is a list packed as: [0] process function name; [1] args; [2] kwargs; lets unpack:
    process_name = args[0]
    process_args = args[1] if len(args) > 1 else []
    process_kwargs = args[2] if len(args) > 2 else {}
    # get our method from its name, assuming global namespace of the current module/script
    process = globals()[process_name]
    response = None  # in case a call fails
    try:
        response = process(*process_args, **process_kwargs)  # call our process function
    except Exception as e:  # too broad but good enough as an example
        print(e)
    # rewind our buffers:
    sys.stdout.seek(0)
    sys.stderr.seek(0)
    # return everything packed as STDOUT, STDERR, PROCESS_RESPONSE | NONE
    return sys.stdout.read(), sys.stderr.read(), response

现在我们只需要调用此包装器而不是所需的foo(),并为其提供有关代表我们调用的内容的信息:

if __name__ == '__main__':
    pool = mp.Pool(4)
    # since we're wrapping the process we're calling, we need to send to the wrapper packed
    # data with instructions on what to call on our behalf.
    # info on args packing available in the std_wrapper function above.
    for out, err, res in pool.imap_unordered(std_wrapper, [("foo", [i]) for i in range(4)]):
        print("[MAIN]: Process finished, response: {}, STDOUT:".format(res))
        print(out.rstrip())  # remove the trailing space for niceness, print err if you want
    pool.close()

所以现在如果你运行它,你会得到这样的东西:

[MAIN]: Process finished, response: None, STDOUT:
[Process 2]: foo:
[Process 2] in foo 0
[Process 2] in foo 1
[Process 2] A random exception is random!
[MAIN]: Process finished, response: 87.9658471743586, STDOUT:
[Process 1]: foo:
[Process 1] in foo 0
[Process 1] in foo 1
[Process 1] in foo 2
[Process 1] in foo 3
[Process 1] in foo 4
[MAIN]: Process finished, response: 38.929554421661194, STDOUT:
[Process 3]: foo:
[Process 3] in foo 0
[Process 3] in foo 1
[Process 3] in foo 2
[Process 3] in foo 3
[Process 3] in foo 4
[MAIN]: Process finished, response: None, STDOUT:
[Process 0]: foo:
[Process 0] in foo 0
[Process 0] in foo 1
[Process 0] in foo 2
[Process 0] in foo 3
[Process 0] in foo 4
[Process 0] A random exception is random!

尽管foo()只是打印或错误。当然,您可以使用这样的包装器来调用任何函数并将任意数量的args / kwargs传递给它。

更新#2 - 等等!如果我们可以像这样包装我们的函数进程,并且捕获它们的STDOUT / STDERR,我们肯定可以将它变成装饰器并在我们的代码中使用它来进行简单的装饰。所以,对于我的最终提案:

import functools
import multiprocessing
import random  # just to add some randomness
import time

def std_wrapper(func):
    @functools.wraps(func)  # we need this to unravel the target function name
    def caller(*args, **kwargs):  # and now for the wrapper, nothing new here
        try:
            from StringIO import StringIO  # ... for Python 2.x compatibility
        except ImportError:
            from io import StringIO
        import sys
        sys.stdout, sys.stderr = StringIO(), StringIO()  # use our buffers instead
        response = None  # in case a call fails
        try:
            response = func(*args, **kwargs)  # call our wrapped process function
        except Exception as e:  # too broad but good enough as an example
            print(e)  # NOTE: the exception is also printed to the captured STDOUT
        # rewind our buffers:
        sys.stdout.seek(0)
        sys.stderr.seek(0)
        # return everything packed as STDOUT, STDERR, PROCESS_RESPONSE | NONE
        return sys.stdout.read(), sys.stderr.read(), response
    return caller

@std_wrapper  # decorate any function, it won't know you're siphoning its STDOUT/STDERR
def foo(x):
    print("[Process {}]: foo:".format(x))
    for i in range(5):
        print('[Process {}] in foo {}'.format(x, i))
        time.sleep(0.2 + 1 * random.random())
        if random.random() < 0.0625:  # let's add a 1/4 chance to err:
            raise Exception("[Process {}] A random exception is random!".format(x))
    return random.random() * 100  # just a random response, you can omit it

现在我们可以像以前一样调用我们的包装函数而不处理参数打包或任何类型的事情,所以我们回到:

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    for out, err, res in pool.imap_unordered(foo, range(4)):
        print("[MAIN]: Process finished, response: {}, STDOUT:".format(res))
        print(out.rstrip())  # remove the trailing space for niceness, print err if you want
    pool.close()

输出与前一个示例中的输出相同,但是在更好的和可管理的包中。