Python多处理:并行管道实现

时间:2017-02-26 01:17:45

标签: python performance subprocess pipeline python-multiprocessing

我正在尝试创建一个管道,但我有糟糕的退出问题(僵尸)和性能问题。我创建了这个泛型类:

class Generator(Process):
'''
<function>: function to call. None value means that the current class will
    be used as a template for another class, with <function> being defined
    there
<input_queues> : Queue or list of Queue objects , which refer to the input
    to <function>.
<output_queues> : Queue or list of Queue objects , which are used to pass
    output
<sema_to_acquire> : Condition or list of Condition objects, which are
    blocking generation while not notified
<sema_to_release> : Condition or list of Condition objects, which will be
    notified after <function> is called
'''

def __init__(self, function=None, input_queues=None, output_queues=None, sema_to_acquire=None,
             sema_to_release=None):
    Process.__init__(self)
    self.input_queues = input_queues
    self.output_queues = output_queues
    self.sema_to_acquire = sema_to_acquire
    self.sema_to_release = sema_to_release
    if function is not None:
            self.function = function

def run(self):
    if self.sema_to_release is not None:
            try:
                self.sema_to_release.release()
            except AttributeError:
                [sema.release() for sema in self.sema_to_release]

    while True:
        if self.sema_to_acquire is not None:
            try:
                self.sema_to_acquire.acquire()
            except AttributeError:
                [sema.acquire() for sema in self.sema_to_acquire]


        if self.input_queues is not None:
            try:
                data = self.input_queues.get()
            except AttributeError:
                data = [queue.get() for queue in self.input_queues]
            isiterable = True
            try:
                iter(data)
                res = self.function(*tuple(data))
            except TypeError, te:
                res = self.function(data)
        else:
            res = self.function()
        if self.output_queues is not None:
            try:
                if self.output_queues.full():
                    self.output_queues.get(res)
                self.output_queues.put(res)
            except AttributeError:
                [queue.put(res) for queue in self.output_queues]
        if self.sema_to_release is not None:
            if self.sema_to_release is not None:
                try:
                    self.sema_to_release.release()
                except AttributeError:
                    [sema.release() for sema in self.sema_to_release]

模拟管道内的工人。 Generator希望运行一个无限循环,其中使用来自n个队列的输入执行一个函数,并将结果写入m个队列。在一次迭代发生之前,有一些信号量需要由进程获取,并且当迭代完成时,一些其他信号量被释放。因此,对于需要并行运行并为另一个生成输入的进程,我将'交叉'信号量作为参数发送,以强制它们一起执行单次迭代。对于不需要并行运行的进程,我不使用任何条件。一个例子(我实际使用,如果有人忽略了输入函数)如下:

import time
from multiprocess import Lock
print_lock = Lock()
_t_=0.5
def func0(data):
    time.sleep(_t_)
    print_lock.acquire()
    print 'func0 sends',data
    print_lock.release()
    return data
def func1(data):
    time.sleep(_t_)
    print_lock.acquire()
    print 'func1 receives and sends',data
    print_lock.release()
    return data
def func2(data):
    time.sleep(_t_)
    print_lock.acquire()
    print 'func2 receives and sends',data
    print_lock.release()
    return data
def func3(*data):
    print_lock.acquire()
    print 'func3 receives',data
    print_lock.release()


run_svm = Semaphore()
run_rf = Semaphore()
inp_rf = Queue()
inp_svm = Queue()
out_rf = Queue()
out_svm = Queue()
kin_stream = Queue()
res_mixed = Queue()
streamproc = Generator(func0,
                       input_queues=kin_stream,
                       output_queues=[inp_rf,
                                       inp_svm])
streamproc.daemon = True
streamproc.start()
svm_class = Generator(func1,
                       input_queues=inp_svm,
                       output_queues=out_svm,
                       sema_to_acquire=run_svm,
                       sema_to_release=run_rf)
svm_class.daemon=True
svm_class.start()
rf_class = Generator(func2,
                      input_queues=inp_rf,
                      output_queues=out_rf,
                      sema_to_acquire=run_rf,
                      sema_to_release=run_svm)
rf_class.daemon=True
rf_class.start()
mixed_class = Generator(func3,
                         input_queues=[out_rf, out_svm])
mixed_class.daemon = True
mixed_class.start()
count = 1
while True:
    kin_stream.put([count])
    count+=1
    time.sleep(1)
streamproc.join()
svm_class.join()
rf_class.join()
mixed_class.join()

这个例子给出了:

func0 sends 1
func2 receives and sends 1
func1 receives and sends 1
func3 receives (1, 1)
func0 sends 2
func2 receives and sends 2
func1 receives and sends 2
func3 receives (2, 2)
func0 sends 3
func2 receives and sends 3
func1 receives and sends 3
func3 receives (3, 3)
...

一切都好。 然而,如果我尝试杀死main,那么其他子进程不能保证终止:终端可能会冻结,或者python编译器可能仍然在后台运行(可能是僵尸)而且我不知道为什么这种情况正在发生,因为我已将相应的守护进程设置为True。 有没有人有更好的想法实现这种类型的管道或可以建议解决这个邪恶的问题?谢谢大家。

修改

修正测试。僵尸仍然确实存在

2 个答案:

答案 0 :(得分:0)

我能够通过引入终止队列作为给定类的附加参数并为SIGINT中断设置信号处理程序来克服这个问题,以便停止管道执行。我不知道这是否是最优雅的方式让它工作,但它的工作原理。此外,设置信号处理程序的方式很重要,因为它必须在process.start()之前设置由于某种原因,如果有人知道原因,他可以发表评论。此外,信号处理程序由子进程继承,因此我必须将join放在try:..except AssertionError:pass模式中,否则会抛出错误(再次,如果有人知道如何绕过这个,请详细说明)。无论如何,它有效。

消息来源

class Generator(Process):
    '''
    <term_queue>: Queue to write termination events, must be same for all
                processes spawned
    <function>: function to call. None value means that the current class will
        be used as a template for another class, with <function> being defined
        there
    <input_queues> : Queue or list of Queue objects , which refer to the input
        to <function>.
    <output_queues> : Queue or list of Queue objects , which are used to pass
        output
    <sema_to_acquire> : Semaphore or list of Semaphore objects, which are
        blocking function execution
    <sema_to_release> : Semaphore or list of Semaphore objects, which will be
        released after <function> is called
    '''

    def __init__(self, term_queue,
                 function=None, input_queues=None, output_queues=None, sema_to_acquire=None,
                 sema_to_release=None):
        Process.__init__(self)
        self.term_queue = term_queue
        self.input_queues = input_queues
        self.output_queues = output_queues
        self.sema_to_acquire = sema_to_acquire
        self.sema_to_release = sema_to_release
        if function is not None:
            self.function = function

    def run(self):
        if self.sema_to_release is not None:
            try:
                self.sema_to_release.release()
            except AttributeError:
                deb = [sema.release() for sema in self.sema_to_release]
        while True:
            if not self.term_queue.empty():
                self.term_queue.put((self.name, 0))
                break
            try:
                if self.sema_to_acquire is not None:
                    try:
                        self.sema_to_acquire.acquire()
                    except AttributeError:
                        deb = [sema.acquire() for sema in self.sema_to_acquire]

                if self.input_queues is not None:
                    try:
                        data = self.input_queues.get()
                    except AttributeError:
                        data = tuple([queue.get()
                                      for queue in self.input_queues])
                    res = self.function(data)
                else:
                    res = self.function()
                if self.output_queues is not None:
                    try:
                        if self.output_queues.full():
                            self.output_queues.get(res)
                        self.output_queues.put(res)
                    except AttributeError:
                        deb = [queue.put(res) for queue in self.output_queues]
                if self.sema_to_release is not None:
                    if self.sema_to_release is not None:
                        try:
                            self.sema_to_release.release()
                        except AttributeError:
                            deb = [sema.release() for sema in self.sema_to_release]
            except Exception as exc:
                self.term_queue.put((self.name, exc))
                break



def signal_handler(sig, frame, term_queue, processes):
    '''
    <term_queue> is the queue to write termination of the __main__
    <processes> is a dicitonary holding all running processes
    '''
    term_queue.put((__name__, 'SIGINT'))
    try:
        [processes[key].join() for key in processes]
    except AssertionError:
        pass
    sys.exit(0)

term_queue = Queue()
'''
initialize some Generators and add them to <processes> dicitonary
'''
signal.signal(signal.SIGINT, lambda sig,frame: signal_handler(sig,frame,
                                                              term_queue,processes))
[processes[key].start() for key in processes]
while True:
    if not term_queue.empty():
        [processes[key].join() for key in processes]
        break

并相应地更改了示例(如果您要我添加,请发表评论)

答案 1 :(得分:0)

我也不得不处理这个问题,事实上,将一些通信管道或队列传递给进程似乎是告诉他们终止的最简单方法。

然而,终止代码可以利用主进程中的finally:块,它将处理包括信号在内的任何事件。

如果您的进程应该与对象同时终止,您可能还想使用weakref.finalize,但它可以是tricky